Option Pricing, Interest Rates and Risk Management This handbook presents the current state of practice, method and und...

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Option Pricing, Interest Rates and Risk Management This handbook presents the current state of practice, method and understanding in the field of mathematical finance. Every chapter has been written by leading researchers and each starts by briefly surveying the existing results for a given topic, then discusses more recent results and, finally, points out open problems with an indication of what needs to be done in order to solve them. The primary audiences for the book are doctoral students, researchers and practitioners who already have some basic knowledge of mathematical finance. In sum, this is a comprehensive reference work for mathematical finance and will be indispensable to readers who need to find a quick introduction or reference to a specific topic, leading all the way to cutting edge material.

HANDBOOKS IN MATHEMATICAL FINANCE

Option Pricing, Interest Rates and Risk Management Edited by E. Jouini Universit´e Paris – Dauphine and CREST

J. Cvitani´c University of Southern California

Marek Musiela Paribas, London

PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE

The Pitt Building, Trumpington Street, Cambridge, United Kingdom CAMBRIDGE UNIVERSITY PRESS

The Edinburgh Building, Cambridge CB2 2RU, UK 40 West 20th Street, New York, NY 10011-4211, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia Ruiz de Alarco´n 13, 28014, Madrid, Spain Dock House, The Waterfront, Cape Town 8001, South Africa http://www.cambridge.org c Cambridge University Press 2001 This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2001 Reprinted 2004 Printed in the United Kingdom at the University Press, Cambridge Typeface Times 11/14pt. System LATEX 2ε [ DBD] A catalogue record of this book is available from the British Library Library of Congress Cataloguing in Publication Data Advances in mathematical finance / edited by E. Jouini, J. Cvitani´c, Marek Musiela. p. cm. Includes bibliographic references and index. ISBN 0 521 79237 1 1. Derivatives securities–Prices–Mathematical models. 2. Interest rates–Mathematical models. 3. Risk management. 4. Securities–Mathematical models. I. Jouini, E. (Ely`es), 1965– II. Cvitani´c, J. (Jaksa), 1962– III. Musiela, Marek, 1950– HG6024.A3 A38 2001 332 .01 51–dc21 00-052911 ISBN 0 521 79237 1

hardback

Contents

List of Contributors Introduction

page vii ix

Part one: Option Pricing: Theory and Practice 1 1 Arbitrage Theory Yu. M. Kabanov 3 2 Market Models with Frictions: Arbitrage and Pricing Issues E. Jouini and C. Napp 43 3 American Options: Symmetry Properties J. Detemple 67 4 Purely Discontinuous Asset Price Processes D. B. Madan 105 5 Latent Variable Models for Stochastic Discount Factors R. Garcia and ´ Renault E. 154 6 Monte Carlo Methods for Security Pricing P. Boyle, M. Broadie and P. Glasserman 185 Part two: Interest Rate Modeling 239 7 A Geometric View of Interest Rate Theory T. Bj¨ork 241 8 Towards a Central Interest Rate Model A. Brace, T. Dun and G. Barton 278 9 Infinite Dimensional Diffusions, Kolmogorov Equations and Interest Rate Models B. Goldys and M. Musiela 314 10 Modelling of Forward Libor and Swap Rates M. Rutkowski 336 Part three: Risk Management and Hedging 397 11 Credit Risk Modelling: Intensity Based Approach T. R. Bielecki and M. Rutkowski 399 12 Towards a Theory of Volatility Trading P. Carr and D. Madan 458 13 Shortfall Risk in Long-Term Hedging with Short-Term Futures Contracts P. Glasserman 477 14 Numerical Comparison of Local Risk-Minimisation and Mean-Variance Hedging D. Heath, E. Platen and M. Schweizer 509 v

vi

Contents

15 A Guided Tour through Quadratic Hedging Approaches

M. Schweizer

538

Part four: Utility Maximization 575 16 Theory of Portfolio Optimization in Markets with Frictions J. Cvitani´c 577 17 Bayesian Adaptive Portfolio Optimization I. Karatzas and X. Zhao 632

Contributors

G. Barton, Department of Chemical Engineering, University of Sydney, Sydney, Australia. T. Bielecki, Department of Mathematics, The Northeastern Illinois University, Chicago, USA. T. Bj¨ork, Department of Finance, Stockholm School of Economics, Box 6501, S-11383 Stockholm, Sweden. P. Boyle, School of Accountancy, University of Waterloo, Waterloo, Ontario N2L 3GI, Canada. Alan Brace, FMMA and NAB, PO Box 731, Grosvenor Place, Sydney 2000, Australia. M. Broadie, Graduate School of Business, Columbia University, New York, NY 10027, USA. P. Carr, Morgan Stanley, 1585 Broadway, 6th floor, New York, NY 10036, USA. J. Cvitani´c, Department of Mathematics, University of Southern California, 1042 West 36th Place, Los Angeles, CA 90089-1113, USA. J. Detemple, School of Management, Boston University, 595 Commonwealth Avenue, Boston, MA 02215, USA. T. Dun, Department of Chemical Engineering, University of Sydney, Sydney, Australia. ´ R. Garcia, D´epartement de Sciences Economiques, Universit´e de Montr´eal, Montr´eal (PQ) H3C 3J7, Canada. P. Glasserman, Columbia Business School, Columbia University, New York, NY 10027, USA. B. Goldys, School of Mathematics, University of New South Wales, Sydney, 2052 NSW, Australia. D. Heath, University of Technology, Sydney, School of Finance & Economics, PO Box 123, Broadway, 2007 NSW, Australia. E. Jouini, Universit´e Paris IX Dauphine, CEREMADE, Place du Mar´echal de Lattre de Tassigny, 75775 Paris, Cedex 16, France. Yu. M. Kabanov, Laboratoire de Math´ematiques, Universit´e de Franche-Comt´e, 16 Route de Gray, F-25030 Besanc¸ on, Cedex, France. I. Karatzas, Departments of Mathematics and Statistics, Columbia University, New York, NY 10027, USA. D. Madan, College of Business and Management, University of Maryland, College Park, MD 20742, USA.

vii

viii

List of contributors

M. Musiela, Paribas, 10 Harewood Avenue, London NW1 6AA, UK. C. Napp, Universit´e Paris IX Dauphine, CEREMADE, Place du Mar´echal de Lattre de Tassigny, 75775 Paris, Cedex 16, France. E. Platen, University of Technology, Sydney, School of Finance & Economics, PO Box 123, Broadway, 2007 NSW, Australia. ´ E. Renault, D´epartement de Sciences Economiques, Universit´e de Montr´eal, Montr´eal (PQ) H3C 3J7, Canada. M. Rutkowski, Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-661 Warsaw, Poland. M. Schweizer, Technische Universit¨at Berlin, Fachbereich Mathematik, Strasse des 17. Juni 136, D-10623, Berlin, Germany. X. Zhao, Departments of Mathematics and Statistics, Columbia University, New York, NY 10027, USA.

Introduction

This book, the final in a series of stand-alone works, is a collection of invited papers that represent the current state of research in the field of Mathematical Finance, as seen by leading researchers in the field. Some of the contributed articles survey the existing results for a given topic, some discuss and present new research, some point out open problems and future directions, while many do all of the above. While effort was made to cover most of the important topics in the field, the book is not meant to be encyclopedic in nature. The outcome was ultimately influenced by the present scientific interest of the contributors and the editors. The primary audience are researchers in academia and industry who already have some basic knowledge of the field. This book might serve as a quick introduction to a specific topic, leading to recent results and open problems. It can also serve as valuable reference material. The first Part focuses on the theory and practice of pricing derivative securities. The paper “Arbitrage theory” by Y. Kabanov considers models where an investor, acting on a financial market with random price movements and having a given time horizon, subsequently transforms his initial endowment into a certain terminal wealth. In this framework, the author answers the following question: whether the investor has arbitrage opportunities, i.e. non-risky profits. The article examines and gives an answer to this question in different frameworks: one-step and multi-step models with finite space of possible states of the world, discrete-time models with infinite space of possible states of the world, continuous time models, semimartingale models, large financial markets and models with transaction costs. The article “Market models with frictions: arbitrage and pricing issues” by E. Jouini and C. Napp extends the previous results in two directions: first, they consider investment opportunities determined by their cash-flows instead of financial assets described by their price processes. This approach enables them to take into account classical market models as well as investment models. Second, the authors consider a wide range of possible market imperfections: transaction ix

x

Introduction

costs, borrowing costs and constraints, short-selling costs and constraints, fixed and proportional transaction costs and models with defaultable num´eraire. In all these cases, they characterize the no-arbitrage assumption through a unified approach and they apply these results to pricing and hedging issues. The contribution by J. Detemple “American options: symmetry properties” surveys generalizations of the classical put–call symmetry: the value of a put option with strike price K on an underlying asset S paying dividends at rate δ in a financial market with riskless interest rate r is the same as the value of a call option with strike price S on an asset paying dividends at rate r and having initial value K , in an auxiliary financial market with interest rate δ. It is shown that the symmetry holds in a large class of models, including nonmarkovian markets with random coefficients, and even for many nonstandard American claims including barrier options, multi-asset derivatives, and occupation time derivatives. The main tool, change of num´eraire technique, is also reviewed and extended to the case of dividend-paying assets. The put–call symmetry reduces the computational burden in pricing options; it provides useful insights into the economic relationship between contracts, and sometimes even helps to reduce the dimensionality of the problem, thereby making somewhat more tractable the difficult problem of evaluating American contingent claims. The article “Monte Carlo methods for security pricing” by P. Boyle, M. Broadie and P. Glasserman, reprinted from Journal of Economic Dynamics and Control, is a detailed survey of simulation methods applied to numerical pricing of European, and, more recently, American options. Since European option prices can be calculated as expected values, it is natural to use Monte Carlo for computing them. However, this can often be quite slow, and this paper reviews and compares different methods used to improve the efficiency of Monte Carlo methods. So-called “variance reduction” techniques are surveyed, including control variates, antithetic variates, moment matching, importance sampling and conditional Monte Carlo methods. Next, the quasi-Monte Carlo approach is reviewed, in which, instead of random numbers, deterministic sequences are generated – so-called quasi-random numbers or low-discrepancy sequences. These are more evenly dispersed than random sequences. It is interesting that these procedures are typically based on number-theoretic methods. The paper also discusses the use of Monte Carlo methods for computing sensitivities (“Greeks”) of the option price with respect to different parameters, and the difficult problem of computing American option prices using simulation. The difficulty stems from the fact that the price of an American option is a maximum of expected values, rather than a single expected value. In their chapter, R. Garcia and E. Renault use the concept of stochastic discount factor (SDF) or pricing kernel as a unifying principle to integrate two concepts

Introduction

xi

of latent variables, one cross-sectional, one longitudinal, in order to reduce the dimension of a statistical model specified for a multivariate time series of asset prices. In the CAPM or APT beta pricing models, the dimension reduction is cross-sectional in nature, while in time-series state-space models, dimension is reduced longitudinally by assuming conditional independence between consecutive returns given a small number of state variables. They provide this unifying analysis in the context of conditional equilibrium beta pricing as well as asset pricing with stochastic volatility, stochastic interest rates and other state variables. They address the general issue of econometric specifications of dynamic asset pricing models, which cover the modern literature on conditionally heteroskedastic factor models as well as equilibrium-based asset pricing models with an intertemporal specification of preferences and market fundamentals. D. Madan, in his contribution “Purely discontinuous asset price processes” surveys his work with various co-authors on modeling asset prices with pure jump processes, and on pricing contingent claims in such models. It is argued that statistical analysis leads to the consideration of discontinuous asset prices models, in which the arrival rate of jumps is infinite and decreasing in the jump size. Such models are also motivated by theoretical no-arbitrage considerations, implying that the prices must be modeled as time-changed Brownian motion. If, as is argued, this time change has to be modeled as random, we are led to the class of discontinuous price processes. Being of bounded variation, these prices are also more robust relative to change of parameters than the typical diffusion models. The example of the so-called variance gamma process is presented in detail, including solutions to option pricing and optimal investment problems in such a market model. Using these solutions, the model is calibrated, which is in turn used to infer trader preferences and personalized risk neutral measures, called position measures. The paper is representative of a very active field of research, rich in theoretical and practical implications. Part II presents different aspects of the theory and practice of interest rate modeling. Arbitrage-free movement of the forward curve is analyzed from the perspective of infinite dimensional diffusions by T. Bj¨ork in his article “A geometric view of interest rate theory”. He addresses the following questions: when is a given forward rate model consistent with a given family of forward rate curves and when can the inherently infinite dimensional forward rate process be realized by means of a finite-dimensional state space model? Necessary and sufficient conditions for consistency as well as for the existence of finite-dimensional realizations are given in terms of forward rate volatilities. That is, the forward rate model generated by a collection of volatility functions admits a finite dimensional realization if and only if the corresponding Lie algebra generated by the volatility functions and the

xii

Introduction

drift (which is also uniqely determined from the volatility functions by arbitrage considerations) is finite-dimensional in the neighbourhood of the initial condition. General consistency results are not given in this chapter, though references are made to the recent papers and the PhD thesis by D. Filipovic. Instead, the author concentrates on analysis of the Nelson–Siegel (NS) family of forward curves. It turns out that neither the Hull–White (HW) nor the Ho–Lee (HL) model is consistent with the NS family. In fact the NS manifold is too small for the HW and HL models, in the sense that if the initial curve is on the manifold, then the models will force the term structure off the manifold within an arbitrarily short period of time. The infinite-dimensional approach is also taken in the chapter: “Infinite dimensional diffusions, Kolmogorov equations and interest rate models” by B. Goldys and M. Musiela. The main emphasis is put on differential analysis in infinite dimension. Motivation comes from the need for a better understanding of interest rate risk management issues. To be more precise let us look first at the Black–Scholes model. The lognormal diffusion process generating arbitrage free evolution of the variable of interest can also be represented by corresponding it with an infinitesimal generator. Pricing of options is identical to solving the related Kolmogorov equation. Sensitivity to the change in the stochastic variable is done by simple differentiation of the price. The situation in the interest rate area is more complex. The underlying stochastic variable is the entire forward curve. The diffusion process defining the evolution of the forward curve is infinite-dimensional. The infinitesimal generator and the corresponding Kolmogorov equation need to be defined and studied from the perspective of the sensitivity of an interest rate option to the changes in the shape of the forward curve. It turns out that one can obtain Feynman–Kac representations of solutions to such equations for a large class of terminal conditions (which include most of the treated products) and that for those the price is differentiable with respect to the initial forward curve. This is in contrast with poor smoothing properties of the associated semigroup and the fact that not all the payoffs have discounted expected values which are Fr´echet differentiable. While continuous compounding associated with the continuous tenor models may ultimately lead to more unified infinite-dimensional theories of the forward curve dynamics, at the implementation level one is almost forced to work with models allowing for finite-dimensional realizations. On the other hand, simple compounding corresponding to a given discrete tenor structure has the advantage of being grounded on standard finite-dimensional semimartingale theory, which is better understood and more developed. Additionally, it represents the interest rate markets more realistically. As such, it is arguably better suited for the pricing of most Libor and swap derivatives. The canonical forward Libor and swap rate models with deterministic volatilities are by construction

Introduction

xiii

finite-dimensional diffusions under any of the Libor measures (spot or forward). The explicit relationships between the measures allow for the development of exact expressions or at least of good analytic approximations to a number of options such as caps and swaptions. The chapter: “Modelling of forward Libor and swap rates” by M. Rutkowski presents an overview of recently developed methodologies related to the derivation and analysis of the arbitrage free dynamics of such market rates. The article: “Towards a central interest rate model” by A. Brace, T. Dun and G. Barton aims to expose issues related with implementation of the canonical lognormal forward Libor model. The pricing of swaptions is examined within this framework and compared to the industry standard Black swaption formula, and, by extension, to the lognormal swap rate model. Swap and swaption behaviour are investigated under arbitrary volatility and yield curve specifications. Simulation and approximation techniques are used to make comparisons in terms of observed swap rate probability distributions, swaption volatilities and prices, and swaption sensitivities defined in terms of the swap rate. Fifteen swaptions and two volatility structures are considered. Swap rates simulated under the lognormal Libor model are shown to be statistically lognormal in each case, and volatilities, prices and Greeks agree closely. Finally, the approximate delta value within the lognormal Libor model is used in a simulated delta-hedging exercise and is seen to successfully hedge Libor model swaptions. This points to the robustness of the lognormal Libor model for the following two reasons. Firstly, the exact delta of a swaption, in a lognormal Libor model, is, in fact, the vector of partial derivatives of the swaption price with respect to the underlying forward Libor rates. Secondly, the volatility of the forward swap rate under the corresponding forward swap rate measure, in the lognormal Libor model, is stochastic. Overall, in the authors’ opinion, the forward Libor model is the unifying model capable of encompassing the properties of the swap rate model and allowing for greater aggregation of risk in portfolios containing Libor and swap derivatives. The third Part considers different types of risk in financial markets, and ways to manage and hedge exposure to risk. “Credit risk modelling: an intensity based approach” by T. Bielecki and M. Rutkowski reviews fundamental methodologies and results in the area of the intensity based default and credit risk modeling. Special care is devoted to the technical issues of the role of conditioning information in computations involving random times. The time of default is modeled via a jump process with positive jump intensity. An overview of credit-risk instruments is provided, together with market methods for pricing them. Next, the basic theory of valuation of defaultable claims is presented, and various specifications for modeling recovery value at or after the time of default are discussed. Moreover, models that account for the migration between credit-rating grades are surveyed, both in discrete-time and continuous-time. A credit-spread based HJM-type model

xiv

Introduction

is presented, in which default-free and defaultable term structure is modeled. Finally, the theory is applied to the problem of valuation of some common credit derivatives. The area of credit and default risk has been very active and popular in recent years, both in financial industry practice and in academic research. The primary purpose of the article: “Towards a theory of volatility trading” by P. Carr and D. Madan is to review three methods which recently emerged for trading realized volatility. The first method involves taking static position in options. The classic example is that of a log position in a straddle. The second method involves delta-hedging of an option. If an investor is successful in hedging away the price risk, then a prime determinant of the profit or loss from this strategy is the difference between the realized volatility and the anticipated volatility used in pricing and hedging the option. The final method reviewed for trading realized volatility involves buying or selling an over-the-counter contract whose payoff is an explicit function of volatility. The simplest example of such a volatility contract is a volatility swap. This contract pays the buyer the difference between the realized volatility and a level of volatility fixed at the outset of the contract. A secondary purpose is to uncover the link between volatility contracts and some recent ground-breaking work by Dupire and Derman, Kani, and Kamal. By restricting the set of times and price levels for which returns are used in volatility calculations, one can synthesize a contract which pays off the “local volatility”. The contribution by P. Glasserman, “Shortfall risk in long-term hedging with short-term futures contracts” proposes and analyzes a measure of the risk of a cash shortfall in hedging a risky position over time. The measure is illustrated by comparing various hedging strategies for firm hedging a long-term commitment with short-dated future contracts. It is motivated by the infamous case of derivatives losses suffered by Metallgesellschaft Refining and Marketing. The firm had entered into long-term contracts to supply oil at fixed prices, and was hedging these commitments with short-term future contracts. While the strategy would have produced, at least theoretically, a perfect hedge at the end of the long-term contract, it led to a severe cash shortfall during the life of the contract. In a Gaussian model the theory of Gaussian extremes and large deviation approximations are used to calculate this measure, to capture qualitative features of the shortfall risk and to identify the most likely path to a shortfall under different hedging strategies. A brief summary of concepts pertinent to futures and forwards is provided in an appendix. The theory for analyzing liquidity risks is only in its infancy, and this paper indicates some possible ways for making progress in developing it. M. Schweizer’s contribution “A guided tour through quadratic hedging approaches” gives an overview of the general theory of pricing and hedging contingent claims in incomplete markets by means of a quadratic criterion. It is

Introduction

xv

based on numerous papers by the author and his co-workers. It is an example of an abstract theory developed for very practical problems, since many models used in practice are, indeed, incomplete. The paper explains the notions of local risk-minimization, the minimal martingale measure, the variance-optimal martingale measure, mean-variance hedging, F¨ollmer–Schweizer decomposition, and so on. It first discusses the case in which the hedging strategies are not required to be self-financing. If the discounted price process is a local martingale, one can find a risk-minimizing strategy, which is also mean self-financing. In the general case, one can only find so-called locally risk-minimizing strategies. In the last part of the article, the mean-variance criterion is considered for those strategies that are required to be self-financing, and the connection to closedness properties of spaces of stochastic integrals is studied. Despite the significant progress that has been made on these problems over the years, and the success of complete characterization of solutions in special cases, in general, questions about how to actually construct optimal strategies remain open, and the search for those solutions is still ongoing. The companion chapter “Numerical comparison of local risk-minimization and mean-variance hedging” by D. Heath, E. Platen and M. Schweizer focuses on the more practical aspects of the two criteria. It begins with the concrete situation of a Markovian stochastic volatility setting and there provides general comparative results on prices, hedging strategies and risks for local risk-minimization versus mean-variance hedging. A detailed analysis including numerical results is then performed for the well-known Heston and Stein/Stein stochastic volatility models. The results highlight some important quantitative differences between the two approaches and give some directions for future research. Part IV contains papers on the optimal portfolio selection problem. The article “Theory of portfolio optimization in markets with frictions” by one of the editors (J.C.) surveys results on extending the classical Merton’s utility maximization problem in continuous-time models driven by Brownian motion, to the case of markets which are incomplete due to the presence of portfolio constraints, transaction costs, different borrowing and lending rates, and so on. The methodology employed is to first characterize the minimal cost of super-replicating a given claim in such markets, and then solve an optimization problem dual to the utility maximization problem. If the dual problem is appropriately defined, it can then be shown, using the results on super-replication, that the optimal strategy can be characterized in terms of the solution to the dual problem. Explicit results are available for many examples in the case of portfolio constraints and different borrowing and lending rates, but not in the case of transaction costs. In terms of open problems, as far as the general theory is concerned, some of these

xvi

Introduction

results have not yet been fully extended to general arbitrage-free semimartingale models. “Bayesian adaptive portfolio optimization” by I. Karatzas and X. Zhao also considers the portfolio optimization problem, but in the framework of the stock return rates being unobserved by the investor. Instead, they are modeled in a Bayesian fashion, as a random vector with a known probability distribution. The investor is assumed to observe past and present stock prices, and has to base investment decisions only on that information. The value function is obtained using both filtering/martingale and stochastic control/partial differential equation techniques. The former approach transforms the problem into one with the drift process adapted to the observation process, while the latter approach is used to show that the Hamilton–Jacobi–Bellman equation for this problem takes the form of a generalized Monge–Amp`ere equation, which is solved fairly explicitly. Next, it is shown that, for the logarithmic utility function, the cost of uncertainty about the unknown drift of the stock prices (relative to an investor who can observe the drift) is asymptotically negligible. The results are also extended to the case of portfolio constraints. The article is a contribution to the very lively line of research in financial economics and mathematics dealing with problems of incomplete or asymmetric information. The editors would like to express their gratitude to the individuals who made the book possible. Thanks are above all due to all the contributors – they have worked with us with enthusiasm and efficiency, making the editorial job truly enjoyable. The project would not have been possible without the immense efforts, support and vision of David Tranah of Cambridge University Press. We are sincerely grateful for his high professionalism and constant encouragement. We are also thankful to Elsevier, for permitting us to reprint the paper by Boyle, Broadie and Glasserman in this book. J.C., E.J. and M.M.

Part one Option Pricing: Theory and Practice

1 Arbitrage Theory Yu. M. Kabanov

1 Introduction We shall consider models where an investor, acting on a financial market with random price movements and having T as his time horizon, transforms the initial ξ endowment ξ into a certain resulting wealth; let RT denote the set of all final wealth corresponding to possible investment strategies. The natural question is, whether the investor has arbitrage opportunities, i.e. whether he can get non-risky profits. Let us “hide” in a “black box” the interior dynamics on the time-interval [0, T ] (i.e. the price process specification, market regulations, description of admissible strategies) and examine only the set RTξ . At this level of generality, the answer, as well as the hypotheses, should be ξ formulated only in terms of properties of the sets R T . E.g., in the simplest situation of frictionless market without constraints, R T0 is a linear subspace in the space L 0 of (scalar) random variables and RTξ = ξ + R T0 . The absence of arbitrage opportunities can be formalized by saying that the intersection of RT0 with the set L 0+ of non-negative random variables contains only zero. If the underlying probability space is finite, i.e. if we assume in our model only a finite number of states of the nature, it is easy to prove that there is no arbitrage if and only if there exists an equivalent “separating” probability measure with respect to which every element of RT0 has zero mean. Close look at this result shows that this assertion is nothing but the Stiemke lemma [62] of 1915 which is well-known in the theory of linear inequalities and linear programming as an example of the so-called alternative (or transposition) theorems, see historical comments in [61]; notice that the earliest alternative theorem due to Gordan [21] (of 1873) can be also interpreted as a no-arbitrage criterion. The one-step model can be generalized (or specialized, depending on the point of view) in many directions giving rise to what is called arbitrage theory. The reader should not be confused by using “general” and “special” in this context: obviously, 3

4

Yu. M. Kabanov

one-step models are particular cases of N -period models, but quite often the main difficulties in the analysis of models with a detailed (“specialized”) structure of the “black box” consist in verifying hypotheses of theorems corresponding to the one-step case. The geometric essence of these results is a separation of convex sets with a subsequent identification of the separating functional as a probability measure; the properties of the latter in connection with the price process are of particular interest. To this date one can find in the literature dozens of models of financial markets together with a plethora of definitions of arbitrage opportunities. These models can be classified using the following scheme.

1.1 Finite probability space Assuming only a finite number of states of the nature is popular in the literature on economics. Of course, the hypothesis is not adequate to the basic paradigm of stochastic modeling because random variables with continuous distributions cannot “live” on finite probability spaces. The advantage of working under this assumption is that a very restricted set of mathematical tools (basically, elementary finite-dimensional geometry) is required. Results obtained in this simplified setting have an important educational value and quite often may serve as the starting point for a deeper development.

1.2 General probability space In contrast to the case of finite probability space, the straightforward separation arguments, which are the main instruments to obtain no-arbitrage criteria, fail to be applied without further topological assumptions on RT0 . In many particular cases, especially in the theory of continuous trading, they are not fulfilled. This circumstance led Kreps (1981) to a more sophisticated “no-arbitrage” concept, namely, that of “no free lunch” (NFL). However, certain no-arbitrage criteria are of the same form as for the models with finite probability space . 1.3 Discrete-time multi-period models Even for the case of finite probability space , these models are important because they allow us to describe the intertemporal behavior of investors in financial markets, i.e. to penetrate into the structure of the “black box” using concepts of random processes. One of the most interesting features is that in the simplest model without constraints the value processes of the investor’s portfolios are martingales with respect to separating measures and the same property holds for the underlying

1. Arbitrage Theory

5

price process; this explains the terminology “equivalent martingale measures”. Models based on the infinite posed challenging mathematical questions, e.g., whether the absence of arbitrage is still equivalent to the existence of equivalent martingale measure. For a frictionless market the affirmative answer has been given by Dalang, Morton, and Willinger (1990). Their work, together with the earlier paper of Kreps, stimulated further research in geometric functional analysis and stochastic calculus, involving rather advanced mathematics.

1.4 Continuous trading Although the continuous-time stochastic processes were used for modeling from the very beginning of mathematical finance (one can say that they were even invented exactly for this purpose, having in mind the Bachelier thesis “Th´eorie de la sp´eculation” where Brownian motion appeared for the first time), their “golden age” began in 1973 when the famous Black–Scholes formula was published. Subsequent studies revealed the role of the uniqueness of the equivalent martingale measure for pricing of derivative securities via replication. The importance of no-arbitrage criteria seems to be overestimated in financial literature: the unfortunate alias FTAP – Fundamental Theorem of Asset (or Arbitrage) Pricing, ambitious and misleading, is still widely used. If there are many equivalent martingale measures, the idea of “pricing by replication” fails: a contingent claim may not belong to RTx whatever x is, or may belong to many RTx . In the latter case it is not clear which martingale measure can be used for pricing and this is the central problem of current studies on incomplete markets. However, as to mathematics, the no-arbitrage criteria for general semimartingale models are considered among the top achievements of the theory. In 1980 Harrison and Pliska noticed that stochastic calculus, i.e. the integration theory for semimartingales, developed by P.-A. Meyer in a purely abstract way, is “tailor-made” for financial modeling. In 1994 Delbaen and Schachermayer confirmed this conclusion by proving that the absence of arbitrage in the class of elementary, “practically admissible” strategies implies the semimartingale property of the price process. In a series of papers they provided a profound analysis of the various concepts culminating in a result that the Kreps NFL condition (equivalent to a whole series of properties with easier economic interpretation) holds if and only if the price process is a σ -martingale under some P˜ ∼ P. There is another justification of the increasing interest in semimartingales in financial modeling: mathematical statistics sends alarming signals that in many cases empirical data for financial time series are not compatible with the hypothesis that they are generated by processes with continuous sample paths. Thus, diffusions should be viewed

6

Yu. M. Kabanov

only as strongly stylized models of financial data; it has been revealed that L´evy processes give much better fit.

1.5 Large financial markets This particular group, including the so-called Arbitrage Pricing Model (or Theory), abbreviated to APM (or APT), due to Ross and Huberman (for the one-period case), has the following specific feature. In contrast with the conventional approach of describing a security market by a single probabilistic model, a sequence of stochastic bases with an increasing but always finite number of assets is considered. One can think that the agent wants to concentrate his activity on smaller portfolios because of his physical limitations but larger portfolios in this market may have better performance. The arbitrage is understood in an asymptotic sense. Its absence implies relationships between model parameters which can be verified empirically. This circumstance makes such models especially attractive. The weak side of APM is the use of the quadratic risk measure. This means that gains are punished together with losses in symmetric ways which is unrealistic. Luckily, the conclusion of APM, the Ross–Huberman boundedness condition, seems to be sufficiently “robust” with respect to the risk measure and the variation of certain model parameters. In the recent papers [36] and [37], where the theory of large financial markets was extended to the general semimartingale framework, the concept of asymptotic arbitrage is developed for an “absolutely” risk-averse agent. In spite of a completely different approach, the absence of asymptotic arbitrage implies, for various particular models, relations similar to the Ross–Huberman condition.

1.6 Models with transaction costs In the majority of models discussed in mathematical finance, the investor’s wealth is scalar, i.e. all positions are measured in units of a single asset (money, bond, bank account, etc.). However, in certain cases, e.g., in models with constraints and, especially, in those taking transaction costs into account, it is quite natural to consider, as the primary object, the whole vector-valued process of current positions, either in physical quantities or in units of values measured by a certain num´eraire. It happens that this approach allows not only for a more detailed and realistic description of the portfolio dynamics but also opens new perspectives for further mathematical development, in particular, for an extensive use of ideas from theory of partially ordered spaces, utility theory, optimal control, and mathematical economics. Until now only a few results are available in this new branch of arbitrage theory. Recent studies [34] and [41] show that the basic concept of

1. Arbitrage Theory

7

arbitrage theory, that of the equivalent martingale measure, should be modified and generalized in an appropriate way. There are various approaches to the problem which will be discussed here. Notice that models with transaction costs quite often were considered as completely different from those of a frictionless market and the classical results could not be obtained as corollaries when transaction costs vanish. The modern trend in the theory is to work in the framework which covers the latter as a special case. Arbitrage theory includes another, even more important subject, namely, hedging theorems, closely related with the no-arbitrage criteria. These results, discussed in the present survey in a sketchy way, give answers to whether a contingent claim can be replicated in an appropriate sense by a terminal value of a self-financing portfolio or whether a given initial endowment is sufficient to start a portfolio replicating the contingent claim. Other related problems such as market completeness or models with continuum securities, arising in the theory of bond markets, are not touched here. The books [52], [57], and [29] may serve as references in convex analysis, probability, and stochastic calculus.

2 Discrete-time models 2.1 General setting Let (, F, F = (Ft ), P) be a stochastic basis (i.e. filtered probability space), t = 0, 1, . . . , T . We assume that each σ -algebra Ft is complete. We are given: • convex cones Rt0 ⊆ L 0 (Rd , Ft ); • closed convex cones Kt ⊆ L 0 (Rd , Ft ). The notation L 0 (K t , Ft ) is used for the set of all Ft -measurable random variables with values in the set K t (or Ft -measurable selectors of K t if K t depends on ω). The usual financial interpretation: Rt0 is the set of portfolio values at the date t corresponding to the zero initial endowment, i.e. all imaginable results that can be obtained by the investor to the date t. The cones Kt induce the partial orderings in the sets L 0 (Rd , Ft ): ξ ≥t η

⇔

ξ − η ∈ Kt .

The partial orderings ≥t allow us to compare current results. As a rule, they are obtained by “lifting” partial orderings from Rd to the space of random variables.

8

Yu. M. Kabanov

A typical example: Kt = L 0 (K , Ft ) where K is a closed cone in Rd (which may depend on ω and t). In particular, the “standard” ordering ≥t is induced by K t = Rd+ when ξ ≥t η if ξ i ≥ ηi (a.s.) for all i ≤ d; for the case d = 1 it is the usual linear ordering of the real line. However, we do not exclude other partial orderings. In the theory of frictionless market, usually, d = 1; for models with transaction costs d is the number of assets in the portfolio. We define also the set A0T := RT0 − KT . The elements of A0T are interpreted as contingent claims which can be hedged (or super-replicated) by the terminal values of portfolios starting from zero. The linear space LT := KT ∩ (−KT ) describes the positions ξ such that ξ ≥T 0 and ξ ≤T 0, which are “financially equivalent to zero”. The comparison of results can be done modulo this equivalence, i.e. in the quotient space L 0 /LT equipped with the ordering induced by the proper cone K˜ T := π T KT where π T : L 0 → L 0 /LT is the natural projection. 2.2 No-arbitrage criteria for finite The most intuitive formulation of the property that the market has no arbitrage opportunities for the investors without initial capital is the following: NA. KT ∩ RT0 ⊆ LT . In the particular case when KT is a proper cone we have NA . KT ∩ RT0 ⊆ {0} (with equality if RT0 is closed). The first no-arbitrage criterion has the following form. Theorem 2.1 Let be finite. Assume that RT0 is closed. Then NA holds if and only if there exists η ∈ L 0 (Rd , FT ) such that Eηζ > 0

∀ζ ∈ KT \ LT

and Eηζ ≤ 0

∀ζ ∈ RT0 .

Because L 0 is a finite-dimensional space, this result is a reformulation of Theorem A.2 on separation of convex cones. It is easy to verify that KT ∩ RT0 ⊆ LT if and only if KT ∩ A0T ⊆ LT . Hence, in this theorem one can replace RT0 by A0T . The above criterion can be classified as a result for the one-step model where T stands for “terminal”. It has important corollaries for multi-period models where the sets RT0 have a particular structure.

1. Arbitrage Theory

9

3 Multi-step models 3.1 Notations For X = (X t )t≥0 and Y = (Yt )t≥0 we define X − := (X t−1 ) (various conventions for X −1 can be used), X t := X t − X t−1 , and, at last, X · Yt :=

t

X k Yk ,

k=0

for the discrete-time integral. Here X and Y can be scalar or vector-valued. In the latter case sometimes we shall use the abbreviation X • Y for the vector process formed by the pairwise integrals of the components X • Y := (X 1 · Y 1 , . . . , X d · Y d ). Though in the discrete-time case the dynamics can be expressed exclusively in terms of differences, “integral” formulae are often instructive for continuous-time extensions. For finite , if X is a predictable process (i.e. X t is Ft−1 -measurable) and Y belongs to the space M of martingales, then X · Y is also a martingale. The product formula (X Y ) = X Y + Y− X is obvious.

3.2 Example 1. Model of frictionless market The model being classical, we do not give details and financial interpretations: they are widely available in many textbooks. Let S = (St ), t = 0, 1, . . . , T , be a fixed n-dimensional process adapted to a discrete-time filtration F = (Ft ). Here T is a finite integer and, for simplicity, the σ -algebra F0 assumed to be trivial. The convention S−1 = S0 is used. Define RT0 as the linear space of all scalar random variables of the form N · ST where N is an n-dimensional predictable process. For x ∈ R we put RTx = x + RT0 . We take K0 := R+ and KT := L 0 (R+ , FT ). The components S i describe the price evolution of n risky securities, N i is the portfolio strategy which is self-financing, and V is the value process. In this specification it is tacitly assumed that there is a traded asset with the constant unit price, i.e. this asset is the num´eraire. Remark 3.1 One should take care that there is another specification where the num´eraire is not necessarily a traded asset. A possible confusion may arise because

10

Yu. M. Kabanov

the formula for the value process looks similar but the integrand and the integrator are in the latter case d-dimensional processes with d = n + 1. The increments of a self-financing portfolio strategy are explicitly constrained by the relation St−1 Nt = 0. If the num´eraire (“cash” or “bond”) is traded, the integral with respect to the latter vanishes but, of course, holdings in “cash” are not arbitrary but defined from the above relation. For finite we have, in virtue of Theorem 2.1, that the model has no-arbitrage if and only if there is a strictly positive random variable η such that Eηζ = 0 for all ζ ∈ R T0 . Without loss of generality we may assume that Eη = 1 and define the ˜ = 0 for all ζ ∈ RT0 (i.e. E˜ N · ST = 0 probability measure P˜ = η P. Clearly, Eζ for all predictable N ) if and only if S is a martingale. With this remark we get the Harrison–Pliska theorem: Theorem 3.2 Assume that is finite. Then the following conditions are equivalent: (a) R T0 ∩ L 0 (R+ , FT ) = {0} (no-arbitrage); ˜ (b) there exists a measure P˜ ∼ P such that S ∈ M( P). Let ρ t := d P˜t /d Pt be the density corresponding to the restrictions of P˜ and P to Ft . Recall that the density process ρ = (ρ t ) is a martingale ρ t = E(ρ T |Ft ). Since ˜ ⇐⇒ Sρ ∈ M(P), S ∈ M( P) we can add to the conditions of the above theorem the following one: (b ) there is a strictly positive martingale ρ such that ρ S ∈ M. Notice that the equivalence of (b) and (b ) is a general fact which holds for arbitrary and even in the continuous-time setting. Though the property (b ) can be considered simply as a reformulation of (b), it is more adapted to various extensions. The advantage of (b) is in the interpretation of P˜ as a “risk-neutral” probability.

3.3 Example 2. Model with transaction costs Now we describe a discrete-time version of a multi-currency model with proportional transaction costs introduced in [34] and studied in the papers [11] and [41]. It is assumed that the components of an adapted process S = (St1 , . . . , Std ), t = 0, 1, . . . , T , describing the dynamics of prices of certain assets, e.g., currencies quoted in a certain reference asset (say, “euro”), are strictly positive. It is

1. Arbitrage Theory

11

convenient to choose the scales to have S0i = 1 for all i. We do not suppose that the num´eraire is a traded security. The transaction costs coefficients are given by an adapted process = (λi j ) taking values in the set Md+ of non-negative d × d-matrices with zero diagonal. The agent’s portfolio at time t can be described either by a vector of “physical” t = (V t1 , . . . , V td ) or by a vector V = (Vt1 , . . . , Vtd ) of values invested quantities V in each asset. The relation i = V i /S i , V t t t

i ≤ d,

is obvious. Introducing the diagonal operator φ t (ω) : (x 1 , . . . , x d ) → (x 1 /St1 (ω), . . . , x d /Std (ω)).

(1)

we may write that t = φ t Vt . V The increments of portfolio values are ti Sti + bti Vti = V

(2)

with bti =

d

ji

αt −

j=1

d ij (1 + λi j )α t , j=1

ji

where α t ∈ L 0 (R+ , Ft ) represents the net amount transferred from the position j to the position i at the date t. The first term in the right-hand side of (2) is due to the price increment while the second corresponds to the agent’s actions (made after the revealing of new prices). Notice that these actions are charged by the amount −

d i=1

bti

=

d d

ij

λi j α t

i=1 j=1

diminishing the total portfolio value. With every Md+ -valued process (α t ) and any initial endowment v = V−1 ∈ Rd we associate, using recursively the formula (2), a value process V = (Vt ), t = 0, . . . , T . The terminal values of these processes form the set RTv . Remark 3.3 In the literature one can find other specifications for transaction costs coefficients. To explain the situation, let us define α˜ i j := (1 + λi j )αi j . The

12

Yu. M. Kabanov

increment of value of the i-th position can be written as b = i

d

µ

ji

α˜ tji

−

j=1

d

α˜ it j ,

j=1

where µ := 1/(1 + λ ) ∈ ]0, 1]. The matrix (µi j ) can be specified as the matrix of the transaction costs coefficients. In models with a traded num´eraire, i.e. a non-risky asset, a mixture of both specifications is used quite often. ji

ji

Before analyzing the model, we write it in a more convenient way reducing the dimension of the action space. To this aim we define, for every (ω, t), the convex cone d ij Mt (ω) := x ∈ Rd : ∃ a ∈ Md+ such that x i = [(1+λt (ω))a i j −a ji ], i ≤ d , i=1

which is a polyhedral one as it is the image of the polyhedral cone Md+ under a linear mapping. Its dual positive cone Mt∗ (ω) := w ∈ Rd : inf wx ≥ 0 x∈Mt (ω)

can be easily described by linear homogeneous inequalities. Specifically, Mt∗ (ω) = {w ∈ Rd : w j − (1 + λt (ω))wi ≤ 0, 1 ≤ i, j ≤ d}. ij

We introduce also the solvency cone (in values) d ij K t (ω) := x ∈ Rd : ∃ a ∈ Md+ such that x i + [a ji − (1 + λt (ω))a i j ] ≥ 0, i=1

i ≤d ,

i.e. K t (ω) = Mt (ω) + Rd+ . The negative holdings of a position vector in K t (ω) can ij be liquidated (under transaction costs given by (λt (ω)) to get a position vector in Rd+ . Let B be the set of all processes B = (Bt ) with Bt ∈ L 0 (−Mt , Ft ). It is an easy exercise on measurable selection to check that Bt can be represented using a certain Ft -measurable transfer matrix α t . Thus, the set of portfolio process in the “value domain” coincides with the set of processes V = V v,B , B ∈ B, given by the system of linear difference equations i Vti = Vt−1 Yti + Bti ,

i V−1 = vi ,

(3)

with Yti =

Sti , i St−1

Y0i = 1.

(4)

1. Arbitrage Theory

13

Remark 3.4 Using the notations introduced at the beginning of this section, we can rewrite these equations in the integral form V = v + V− • Y + B,

(5)

Y i = 1 + (1/S−i ) · S i ,

(6)

with

which remains the same also for the continuous-time version but with a different meaning of the symbols, see [34], [39]. It is easier to study no-arbitrage properties of the model working in the “physical domain” where portfolio evolves only because of the agent’s action. Indeed, the is simpler: dynamics of V B i Vti = i t . St This equation is obvious because of its financial interpretation but one can check it formally (e.g., using the product formula). t (ω) := φ t (ω)Mt (ω) and introduce the solvency cone (in physical units) Put M t (ω) := φ t K t (ω) = M t (ω) + Rd . K t , Ft ), 0 ≤ t ≤ T , defines a portfolio process V Every process b with bt ∈ L 0 (− M with V = b and the zero initial endowment. All portfolio processes (in physical units) can be obtained in this way. 0 are obvious. The notations RT0 and R T Lemma 3.5 The following conditions are equivalent: (a) RT0 ∩ L 0 (K T , FT ) ⊆ L 0 (∂ K T , FT ); (b) RT0 ∩ L 0 (Rd+ , FT ) = {0}; 0 ∩ L 0 (Rd+ , FT ) = {0}. (c) R T Proof The equivalence of (b) and (c) is obvious. The implication (a) ⇒ (b) holds because Rd+ \ {0} is a subset of int K T . To prove the remaining implication (b) ⇒ (a) we notice that if VTB ∈ L 0 (K T , FT ) where B ∈ B then there exists / ∂ K T (ω). B ∈ B such that VTB ∈ L 0 (Rd+ , FT ) and VTB (ω) = 0 on the set VTB (ω) ∈ To construct such B , it is sufficient to modify only BT by combining the last transfer with the liquidation of the negative positions. In accordance with [41] we shall say that the market has weak no-arbitrage property at the date T (NAwT ) if one of the equivalent conditions of the above lemma is fulfilled. Apparently, NAwT implies NAw t for all t ≤ T .

14

Yu. M. Kabanov

0 ∩ L 0 (Rd , FT ) = {0} if and only if Lemma 3.6 Assume that is finite. Then R + T there exists a d-dimensional martingale Z with strictly positive components such ∗ , Ft ). that Z t ∈ L 0 ( M 0 is polyhedral. In virtue of Theorem 2.1 the first condition Proof The cone R T is equivalent to the existence of a strictly positive random variable η such that 0 . Let Z t = E(η|Ft ). Since L 0 (− M t , FT ) ⊆ R 0 , the Eηζ ≤ 0 for all ζ ∈ R T T t , Ft ) implying that Z t ∈ L 0 ( M t∗ , Ft ). inequality E Z t ζ ≥ 0 holds for all ζ ∈ L 0 ( M If the second condition of the lemma is fulfilled, we can take η = Z T . Let DT be the set of martingales Z = (Z t ) such that Z t ∈ L 0 (K t∗ , Ft ). The following result from [41] is a simple corollary of the above criteria: Theorem 3.7 Assume that is finite. Then NAwT holds if and only if there exists a process Z ∈ D with strictly positive components. This result contains the Harrison–Pliska theorem. Indeed, in the case where all λi j = 0, the cone K = K˜ := {x ∈ Rd : x1 ≥ 0} and K ∗ = R+ 1. Thus, for Z ∈ D all components of the process Z are equal. If, e.g., the first asset is the num´eraire, then Z 1 = Z 1 is a martingale as well as the processes S i Z 1 , i = 2, . . . , d, i.e. Z 1 is a martingale density. Remark 3.8 For models with transaction costs other types of arbitrage may be of interest. E.g., it is quite natural to consider the ordering induced by the cone K˜ := {x ∈ Rd : x1 ≥ 0} (corresponding to the absence of transaction costs), see a criterion in [41] which can be obtained along the same lines as above. Remark 3.9 It is easily seen that d ij t (ω) := y ∈ Rd : ∃ c ∈ Md+ such that y i = [π t (ω)ci j − c ji ], i ≤ d , (7) M j=1

where ij

ij

j

π t := (1 + λt )St /Sti ,

1 ≤ i, j ≤ d.

(8) ij

One can start the modeling by specifying instead of the process (λt ) the process ij (π t ) with values in the set of non-negative matrices with units on the diagonal. t , Ft ) and the set of with V t ∈ L 0 (− M Defining directly the set of processes V 0 “results” RT , one can get Lemma 3.6 immediately. The advantage of this approach is that the existence of the reference asset (i.e. of the price process S) is not assumed and we have a model of “pure exchange”. A question arises when such a model can be reduced to a transaction costs model with a reference asset, i.e. under what

1. Arbitrage Theory

15

conditions on the matrix (π i j ) one can find a matrix (λi j ) with positive entries and a vector S with strictly positive entries satisfying the relation (8).

3.4 The Dalang–Morton–Willinger theorem Let us consider again the classical model of a frictionless market but now without any assumption on the stochastic basis. Theorem 3.10 The following conditions are equivalent: RT0 ∩ L 0 (R+ , FT ) = {0} (no-arbitrage); A0T ∩ L 0 (R+ , FT ) = {0}; A0T ∩ L 0 (R+ , FT ) = {0} and A0T = A¯ 0T , the closure in L 0 ; A¯ 0T ∩ L 0 (R+ , FT ) = {0}; for every probability measure P ∼ P there is a measure P˜ ∼ P such that ˜ ˜ P ≤ const and S ∈ M( P); d P/d ˜ ( f ) there is a probability measure P˜ ∼ P such that S ∈ M( P). ˜ (g) there is a probability measure P˜ ∼ P such that S ∈ Mloc ( P). (a) (b) (c) (d) (e)

It seems that these equivalent conditions (among many others) are the most essential ones to be collected in a single theorem. The equivalence of (a), (e), and ( f ) relating a “financial property” of absence of arbitrage with important “probabilistic” properties is due to Dalang, Morton, and Willinger [8]. Their approach is based on a reduction to a one-stage problem which is very simple for the case of trivial initial σ -algebra; regular conditional distributions and measurable selection theorem allows us to extend the arguments to treat the general case, see [53], [29], and [58] for other implementations of the same idea. Formally, the equivalence (a) ⇔ ( f ) is exactly the same as the Harrison–Pliska theorem and one could think that it is just the same result under the relaxed hypothesis on . In fact, such a conclusion seems to be superficial: the equivalent “functional-analytic property” (c), discovered by Schachermayer in [56] , shows clearly the profound difference between these two situations. Schachermayer’s condition opens the door to an extensive use of geometric functional analysis in the discrete-time setting which was reserved previously only for continuous-time models. It is quite interesting to notice that the set RT0 is always closed while A0T is not. The condition (d) introduced by Stricker in [60] also gives a hint on an appropriate use of separation arguments. Specifically, the Kreps–Yan theorem (see the Appendix) can be applied to separate A0T ∩ L 1 (P ) from L 1+ (P ) = L 1 (R+ , P ) where the measure P ∼ P can be chosen arbitrarily: this freedom allows us to obtain an “equivalent separating measure” with a desired property.

16

Yu. M. Kabanov

Notice that the crucial implication (b) ⇒ (d) seems to be easier to prove than (a) ⇒ (c), see [36] where a kind of “linear algebra” with random coefficients was suggested. The literature provides a variety of other equivalent conditions complementing the list of the above theorem. Some of them are interesting and non-trivial. A family of conditions is related with various classes of admissible strategies B (which is the set of all predictable process in our formulation). Since the sets RT0 and A0T depend on this class, so does the no-arbitrage property. It happens, however, that the latter is quite “robust”: e.g., it remains the same if we consider as admissible only the strategies with non-negative value processes. The problem of admissibility is not of great importance since we assume a finite time horizon. The situation is radically different for continuous-time models where one must work out the doubling strategies which allow us to win even betting on a martingale. Proof of Theorem 3.10 The implications (a) ⇒ (b) and (c) ⇒ (d) are obvious as well as the chain (e) ⇒ ( f ) ⇒ (g). To prove the implication (d) ⇒ (e) we observe that the two properties are invariant under the equivalent change of measure. Thus, we may assume that P = P and, moreover, by passing to the measure ce−η P with η = supt≤T |St |, that all St are integrable. The set A¯ 10 ∩ L 1 is closed in L 1 and intersects with L 1+ ˜ P ∈ L ∞ such only at zero. By the Kreps–Yan theorem there is a P˜ with d P/d 1 1 ˜ ≤ 0 for all ξ ∈ A¯ 0 ∩ L . Taking ξ = ±Ht St where Ht is bounded and that Eξ Ft−1 -measurable, we conclude that S is a martingale. The implication (g) ⇒ (a) is also easy. If H · St ≥ 0 for all t ≤ T , then, ˜ ˜ by the Fatou lemma, the local P-martingale H · S is a P-supermartingale and, therefore, E˜ H · ST ≤ 0, i.e. H · ST = 0. In other words, there is no arbitrage in the class of strategies with non-negative value processes. This implies (a) since for any arbitrage opportunity H there is an arbitrage opportunity H with non-negative value process. Indeed, if P(H · Ss ≤ −b) > 0 for some s < T and b > 0, then one can take H = I]s,T ]×{H ·Ss ≤−b} H . In the proof of the “difficult” implication (b) ⇒ (c) we follow [42]. Lemma 3.11 Let ηn ∈ L 0 (Rd ) be such that η := lim inf |ηn | < ∞. Then there are η˜ k ∈ L 0 (Rd ) such that for all ω the sequence of η˜ k (ω) is a convergent subsequence of the sequence of ηn (ω). Proof Let τ 0 := 0 and τ k := inf{n > τ k−1 : ||ηn | − η| ≤ 1/k}. Then η˜ k0 := ητ k is in L 0 (Rd ) and supk |η˜ k0 | < ∞. Working further with the sequence of η˜ n0 we construct, applying the above procedure to the first component, a sequence of η˜ k1 with the convergent first component and such that for all ω the sequence of η˜ k1 (ω) is

1. Arbitrage Theory

17

a subsequence of the sequence of η˜ n0 (ω). Passing on each step to the newly created sequence of random variables and to the next component we arrive at a sequence with the desired properties. To show that A0T is closed we proceed by induction. Let T = 1. Suppose that H1n S1 − r n → ζ a.s., where H1n is F0 -measurable and r n ∈ L 0+ . It is sufficient to find F0 -measurable random variables H˜ 1k convergent a.s. and r˜ k ∈ L 0+ such that H˜ 1k S1 − r˜ k → ζ a.s. Let i ∈ F0 form a finite partition of . Obviously, we may argue on each i separately as on an autonomous measure space (considering the restrictions of random variables and traces of σ -algebras). Let H 1 := lim inf |H1n |. On 1 := {H 1 < ∞} we take, using Lemma 3.11, F0 -measurable H˜ 1k such that H˜ 1k (ω) is a convergent subsequence of H1n (ω) for every ω; r˜ k are defined correspondingly. Thus, if 1 is of full measure, the goal is achieved. On 2 := {H 1 = ∞} we put G n1 := H1n /|H1n | and h n1 := r1n /|H1n | and observe that G n1 S1 − h n1 → 0 a.s. By Lemma 3.11 we find F0 -measurable G˜ k1 such that G˜ k1 (ω) is a convergent subsequence of G n1 (ω) for every ω. Denoting the limit by G˜ 1 , we obtain that G˜ 1 S1 = h˜ 1 where h˜ 1 is non-negative, hence, in virtue of (b), G˜ 1 S1 = 0. As G˜ 1 (ω) = 0, there exists a partition of 2 into d disjoint subsets i2 ∈ F0 such that G˜ i1 = 0 on i2 . Define H¯ 1n := H1n − β n G˜ 1 where β n := H1ni /G˜ i1 on i2 . Then H¯ 1n S1 = H1n S1 on 2 . We repeat the procedure on each i2 with the sequence H¯ 1n knowing that H¯ 1ni = 0 for all n. Apparently, after a finite number of steps we construct the desired sequence. T Let the claim be true for T −1 and let t=1 Htn St −r n → ζ a.s., where Htn are n 0 Ft−1 -measurable and r ∈ L + . By the same arguments based on the elimination of non-zero components of the sequence H1n and using the induction hypothesis we replace Htn and r n by H˜ tk and r˜ k such that H˜ 1k converges a.s. This means that the problem is reduced to the one with T − 1 steps.

4 No-arbitrage criteria in continuous time Nowadays, in the era of electronic trading, there are no doubts that continuous-time models are much more important than their discrete-time relatives. As a theoretical tool, differential equations (eventually, stochastic) show enormous advantage with respect to difference equations. Easy to analyze, they provide very precise description of various phenomena and, quite often, allow for tractable closed-form solutions. As we mentioned already, the mathematical finance started from a continuous-time model. The unprecedented success of the Black–Scholes formula

18

Yu. M. Kabanov

confirmed that such models are adequate tools to describe financial market phenomena. The current trend is to go beyond the Black–Scholes world. Statistical tests for financial data reject the hypothesis that prices evolve as processes with continuous sample paths. Much better approximation can be obtained by stable or other types of L´evy processes. Apparently, semimartingales provide a natural framework for discussion of general concepts of financial theory like arbitrage and hedging problems. Though more general processes are also tried, yet a very weak form of absence of arbitrage (namely, the NFLVR-property for simple integrands) in the case of a locally bounded price process implies that it is a semimartingale, see Theorem 7.2 in [12].

4.1 No Free Lunch and separating measure In this subsection we explain relations between the No Free Lunch (NFL) condition due to Kreps, No Free Lunch with Bounded Risk (NFLBR) due to Delbaen, and No Free Lunch with Vanishing Risk (NFLVR) introduced by Delbaen and Schachermayer (see, [48], [10], [12]). Let us assume that in a one-step model of frictionless market admissible strategies are such that the convex cone RT0 (the set of final portfolio values corresponding to zero initial endowment) contains only (scalar) random variables bounded from below. As usual, let A0T := RT0 − L 0 (R+ ). Define the set C := A0T ∩ L ∞ . ¯ C˜ ∗ , and C¯ ∗ the norm closure, the union of weak∗ closures of We denote by C, denumerable subsets, and the weak∗ closure of C in L ∞ ; C+ := C ∩ L ∞ + etc. The properties NA, NFLVR, NFLBR, and NFL mean that C + = {0}, C¯ + = {0}, C˜ +∗ = {0}, and C¯ +∗ = {0}, respectively. Consecutive inclusions induce the hierarchy of these properties: ⊆ C¯ ∗ C ⊆ C¯ ⊆ C˜ ∗ NA ⇐ NFLVR ⇐ NFLBR ⇐ NFL. Define the ESM (Equivalent Separating Measure) property as follows: there ˜ ≤ 0 for all ξ ∈ RT0 . exists P˜ ∼ P such that Eξ The following criterion for the N F L-property was established by Kreps. Theorem 4.1 NFL ⇔ ESM. 1 ˜ Proof (⇐) Let ξ ∈ C¯ ∗ ∩ L ∞ + . Since d P/d P ∈ L , there are ξ n ∈ C with n n 0 ˜ ˜ ˜ n ≤ 0 implying that Eξ n → Eξ . By definition, ξ n ≤ ζ where ζ ∈ RT . Thus, Eξ ˜ ≤ 0 and ξ = 0. Eξ (⇒) Since C¯ ∗ ∩ L ∞ + = {0}, the Kreps–Yan separation theorem given in the

1. Arbitrage Theory

19

˜ ≤ 0 for all ξ ∈ C, hence, for all ξ ∈ RT0 . Appendix provides P˜ ∼ P such that Eξ

4.2 Semimartingale model Let (, F, F = (Ft ), P) be a stochastic basis, i.e. a probability space equipped with a filtration F satisfying the “usual conditions”. Assume for simplicity that the initial σ -algebra is trivial, the time horizon T is finite, and FT = F. A process X = (X t )t∈[0,T ] (right-continuous and with left limits) is a semimartingale if it can be represented as a sum of a local martingale and a process of bounded variation. Let U1 be the set of all predictable processes h taking values in the interval [−1, 1]. We denote by h · S the stochastic integral of a predictable process h with respect to a semimartingale. The definition of this integral in its full generality, especially for vector processes (necessary for financial application), is rather complicated and we send the reader to textbooks on stochastic calculus. The linear space S of semimartingales starting from zero is a Fr´echet space with the quasinorm D(X ) := sup E(1 ∧ |h · X T |) h∈U1

´ which induces the Emery topology, [17]. We fix in S a closed convex subset X 1 of processes X ≥ −1 which contains 0 and satisfies the following condition: for any X, Y ∈ X 1 and for any non-negative bounded predictable processes H, G with H G = 0 the process Z := H · X + G · Y belongs to X 1 if Z ≥ −1. Put X := cone X 1 . The set X is interpreted as the set of value processes. Put RT0 := {X T : X ∈ X }. In this rather general semimartingale model we have NFLVR ⇔ NFLBR ⇔ NFL in virtue of the following: Theorem 4.2 Under NFLVR C = C¯ ∗ . The proof of this theorem given in [34] follows closely the arguments of the Delbaen–Schachermayer paper [12]. Their setting is based on a n-dimensional price process S, the admissible strategies H are predictable Rn -valued processes for which stochastic integrals H · S are defined and bounded from below. The set X 1 of all value process H · S ≥ −1 is closed in virtue of the M´emin theorem on closedness in S of the space of stochastic integrals [50]. If S is bounded then the process H = ξ I]s,t] is admissible for arbitrary ξ ∈ L ∞ (Rn , Ft ), and ˜ (St − Ss ) ≤ 0 for any separating measure P. ˜ In fact, there is equality hence Eξ

20

Yu. M. Kabanov

here because one can change the sign of ξ . Thus, if S is bounded then it is a ˜ It is an easy exercise to martingale with respect to any separating measure P. check that if S is locally bounded (i.e. if there exists a sequence of stopping times τ k increasing to infinity such that the stopped processes S τ k are bounded) then ˜ The case of arbitrary, not necessarily S is a local martingale with respect to P. bounded S is of a special interest because the semimartingale model includes the classical discrete-time model as a particular case. The corresponding theorem, also due to Delbaen–Schachermayer [14], involves the notions of a σ -martingale and an equivalent σ -martingale measure. A semimartingale S is a σ -martingale (notation: S ∈ m ) if G · S ∈ Mloc for some G with values in ]0, 1]. The property Eσ MM means that there is Q ∼ P such that S ∈ m (Q). Theorem 4.3 Let X 1 be the set of stochastic integrals H · S ≥ −1. Then N F L V R ⇔ N F L B R ⇔ N F L ⇔ E S M ⇔ Eσ M M. The remaining non-trivial implication ESM ⇒ Eσ MM follows from Theorem 4.4 Let P˜ be a separating measure. Then for any ε > 0 there is Q ∼ P˜ with Var ( P˜ − Q) ≤ ε such that S is a σ -martingale under Q. A brief account of the Delbaen–Schachermayer theory including a short proof of the above theorem based on the inequality for the total variation distance from [40] is given in [33].

4.3 Hedging theorem and optional decomposition Let us consider the semimartingale model based on an n-dimensional price process S. Let C be a scalar random variable bounded from below and let := {x ∈ R : ∃ admissible H such that x + H · ST ≥ C}. In other words, is the set of initial endowments for which one can find an admissible strategy such that the terminal value of the corresponding portfolio dominates (super-replicates) the contingent claim C. “Admissible” means that the portfolio process is bounded from below by a constant. Obviously, if non-empty, is a semi-infinite interval. The following “hedging” theorem gives its characterization. Let Q be the set of probability measures Q ∼ P with respect to which S is a local martingale.

1. Arbitrage Theory

21

Theorem 4.5 Assume that Q = ∅. Then = [x∗ , ∞[ where x ∗ = sup E Q C. Q∈Q

This general formulation is due to Kramkov [47] who noticed that the assertion is a simple corollary of the following two results. Theorem 4.6 Assume that Q = ∅. Let X be a process bounded from below which is a supermartingale with respect to any Q ∈ Q. Then there is an admissible strategy H and an increasing process A such that X = X 0 + H · S − A. The process H · S, being bounded from below, is a local martingale with respect to every Q ∈ Q (the property that an integral with respect to a local martingale ´ is also a local martingale if it is one-side bounded is due to Emery for the scalar case and to Ansel and Stricker [1] for the vector case). Thus, this decomposition resembles that of Doob–Meyer but it holds simultaneously for the whole set Q; in general, it is non-unique and A may not be predictable but only adapted, hence, A, being right-continuous, is optional. This explains why the above result is usually referred to as the optional decomposition theorem. It was proved in [47] for the case where S is locally bounded; this assumption was removed in the paper [18]. The proof in [18] is probabilistic and provides an interpretation of the integrand H as the Lagrange multiplier. Alternative proofs with intensive use of functional analysis can be found in [13]. For an optional decomposition with constraints see [20], an extended discussion of the problem is given [19]. In [43] it is shown that if P ∈ Q then the subset of Q formed by the measures with bounded densities is dense in Q; this result implies, in particular, that, without any hypothesis, the subset of (local) martingale measures with bounded entropy is dense in Q. Proposition 4.7 Assume that C is such that sup Q∈Q E Q C < ∞. Then there exists a process X which is a supermartingale with respect to every Q ∈ Q such that X t = ess sup Q∈Q E Q (C|Ft ). This result is due to El Karoui and Quenez [16]; its proof also can be found in [47]. Proof of Theorem 4.5 The inclusion ⊆ [x∗ , ∞[ is obvious: if x + H · ST ≥ C then x ≥ E Q C for every Q ∈ Q. To show the opposite inclusion we may suppose that sup Q∈Q E Q H < ∞ (otherwise both sets are empty). Applying the optional decomposition theorem to the process X t = ess sup Q∈Q E Q (C|Ft )

22

Yu. M. Kabanov

we get that X = x∗ + H · S − A. Since x∗ + H · ST ≥ X T = C, the result follows.

4.4 Semimartingale model with transaction costs In this model it is assumed that the price process is a semimartingale S with nonnegative components. The dynamics of the value process V = V v,B is given by the linear stochastic equation V = v + V− • Y + B where Y i = (1/S−i ) · S i , B i :=

d j=1

L ji −

d (1 + λi j )L i j , j=1

and L i j is an increasing right-continuous process representing the accumulated net wealth “arriving” at a position i from the position j. At this level of generality, criteria of absence of arbitrage are still not available but the paper of Jouini and Kallal [30] is an important contribution to the subject. It provides an NFL criterion for the model of stock market with a bid–ask spread where, instead of transaction costs coefficients, two process are given, S and S, describing the evolution of the selling and buying prices. It is shown that a certain (specifically formulated) NFL property holds if and only if there exist a probability measure P˜ ∼ P and a process S whose components evolve between the corre˜ sponding components of S and S such that S is a martingale with respect to P. This result is consistent with the NA criteria for finite , see [41]. Apparently, the approach of Jouini and Kallal can be easily extended to the case of currency markets. However, one should take care that the setting of [30] is that of the L 2 -theory. The limitations of the latter in the context of financial modeling are well-known; in contrast with engineering where energy constraints are welcome, they do not admit an economical interpretation. We attract the reader’s attention to the recent paper [32] of the same authors where problems of equilibrium and viability (closely related with absence of arbitrage) are discussed; see also [31] for models with short-sell constrains. The situation with the hedging theorem is slightly better. Its first versions in [6] (for a two-asset model) and in [34] were established within the L 2 -framework. In the preprint [38] an attempt was made to work with the class of strategies for which the value process is bounded from below in the sense of partial ordering induced by the solvency cone. This class of strategies corresponds precisely to the usual definition of admissibility in the case of frictionless market. However, the result

1. Arbitrage Theory

23

was proved only for bounded price processes. To avoid difficulties one can look for other reasonable classes of admissible strategies. This approach was exploited in the paper [39] which contains the following hedging theorem. It is assumed that the matrix of transaction costs coefficients is constant, the first asset is the num´eraire, and there exists a probability measure P˜ such that S is ˜ a (true) martingale with respect to P. Let Bb be the class of strategies B such that the corresponding value processes are bounded from below by a price process multiplied by (negative) constants (this definition resembles that used by Sin in the frictionless case, [55]). In particular, it is admissible to keep short a finite number of units of assets. Let D be the set of martingales Z such that Z takes values in K ∗ . Notice that ˜ P|Ft ). Moreover, Z ∈ D {Z : Z = wρ, w ∈ K ∗ } ⊆ D where ρ t := E(d P/d 1 1 and we have Z = Z ; since the transaction costs are constant, it follows from the Z | ≤ κ Z 1 for a certain fixed constant κ. With these inequalities defining K ∗ that | remarks it is easy to conclude that Z V v,B is always a supermartingale whatever Z ∈ D and B ∈ Bb are. Define the convex set of hedging endowments = (Bb ) := {v ∈ Rd : ∃B ∈ Bb such that VTv,B ≥ K C} and the closed convex set Z 0v ≥ E Z T C ∀Z ∈ D}. D := {v ∈ Rd : Theorem 4.8 Assume that S is a continuous process and the solvency cone K is proper. Then = D. The “easy” inclusion ⊆ D holds in virtue of the supermartingale property of Z V v,B even without extra assumptions. The proof of the opposite inclusion given in [39] is based on a bipolar theorem in the space L 0 (Rd , FT ) equipped with a partial ordering. The hypotheses of the theorem and the structure of admissible strategies are used heavily in this proof. The assumption that K is proper, i.e. the interior (of K ∗ ) is non-empty, is essential (otherwise, may not be closed). However, the assertion ¯ = D can be established for arbitrary K . How to remove or relax the assumptions on continuity of S to make the result adequate to the hedging theorem without friction remains an open problem. Remark 4.9 It is important to note that the set of hedging endowments depends on the chosen class of admissible strategies. Let B0 be the class of buy-and-hold strategies with a single revision of the portfolio, namely, at time zero when the investor enters the market. It happens that in the most popular two-asset model under transaction costs with the price dynamics given by the geometric Brownian

24

Yu. M. Kabanov

motion where the problem is to hedge a European call option (or, more generally, a contingent claim C = g(ST )) we have (Bb ) = (B 0 ). This astonishing property was conjectured by Davis and Clark [9] and proved independently in [49] and [59], see also [7] and [2] for further generalizations. More precisely, in the mentioned papers it was shown that the investor having the initial endowment in money which is a minimal one to hedge the contingent claim C, can hedge it using buy-and-hold strategy from B0 . In other words, the conclusion was that the point with zero ordinate on the boundary of (Bb ) belongs also to the boundary of a smaller set (B 0 ). In fact, one can extend the arguments and prove that both sets coincide. 5 Large financial markets 5.1 Ross–Huberman APM The main conclusion of the Capital Asset Pricing Model (CAPM) by Lintner and Sharp is the following: the mean excess return on an asset is a linear function of its “beta”, a measure of risk associated with this asset. More precisely, we have the following result. Assume for simplicity that the riskless asset pays no interest. Suppose that the return on the i-th asset has mean µi and variance σ i2 , the market portfolio return has mean µ0 and variance σ 20 . Let γ i be the correlation coefficient between the returns on the i-th asset and the market portfolio. Then µi = µ0 β i where β i := γ i σ i /σ 0 . Unfortunately, the theoretical assumptions of CAPM are difficult to justify and its empirical content is dubious. One can expect that the empirical values of (β i , µi ) form a cloud around the so-called security market line but this phenomenon is observed only for certain data sets. The alternative approach, the Arbitrage Pricing Model (APM) suggested by Ross in [54] and placed on a solid mathematical basis by Huberman, results in a conclusion that there exists a relation between model parameters, which can be viewed as “approximately linear”, giving much better consistency with empirical data. Based on the idea of asymptotic arbitrage, it attracted considerable attention, see, e.g., [3], [4], [26], [27]; sometimes it is referred to as the Arbitrage Pricing Theory (APT). An important reference is the note by Huberman [25] who gave a rigorous definition of the asymptotic arbitrage together with a short and transparent proof of the fundamental result of Ross. The idea of Huberman is to consider a sequence of classical one-step finite-asset models instead of a single one with infinite number of securities (in the latter case an unpleasant phenomenon may arise similar to that of doubling strategies for models with infinite time horizon). When the number of assets increases to infinity, this sequence of models can be considered as a description of a large financial market.

1. Arbitrage Theory

25

A general specification of the n-th model M n is as follows. We are given a stochastic basis (n , F n , Fn , P n ) with a convex cone RT0n of square integrable (scalar) random variables. Assume for simplicity that the initial σ -algebra is trivial, FT = F. Here T stands for “terminal” and can be replaced by 1. As usual, the elements of RT0n are interpreted as the terminal values of portfolios. By definition, a sequence ξ n ∈ RT0n realizes an asymptotic arbitrage opportunity (AAO) if the following two conditions are fulfilled (E n and D n denote the mean and variance with respect to P n ): (a) limn E n ξ n = ∞; (b) limn D n ξ n = limn E n (ξ n − E n ξ n )2 = 0. Roughly speaking, if AAO exists, then, working with large portfolios, the investor can become infinitely rich (in the mean sense) with vanishing quadratic risk. We say that the large financial market has NAA property if there are no asymp totic arbitrage opportunities for any subsequence of market models {M n }. A simple but useful remark: the NAA property remains the same if we replace (a) in the definition of AAO by the weaker property lim supn E n ξ n > 0 (“if one can become rich, one can become infinitely rich”). Let ρ n be the L 2 -distance of R T0n from the unit, i.e. ρ n := inf E n (ξ − 1)2 , ξ ∈RT0n

Proposition 5.1 NAA ⇔ lim infn ρ n > 0. Proof (⇒) Assume that lim infn ρ n = 0. This means (modulo passage to a subsequence) that there are ξ n ∈ RT0n such that E n (ξ n − 1)2 → 0. It follows from the identity E n (ξ n − 1)2 = D n ξ n + (E n ξ n − 1)2 that D n ξ n → 0 and E n ξ n → 1, violating NAA. (⇐) Assume that NAA fails. This means (modulo passage to a subsequence) that there are ξ n ∈ RT0n , ξ n = 0, satisfying (a) and (b). It follows that E n (ξ n )2 = D n ξ n + (E n ξ n )2 → ∞.

n n Put ξ˜ := ξ n / E n (ξ n )2 . Then ξ˜ ∈ RT0n ,

D n ξ˜ = (1/E n (ξ n )2 )D n ξ n → 0 n

and (E n ξ˜ )2 = E n (ξ˜ )2 − D n ξ˜ = 1 − D n ξ˜ → 1. n

n

n

n

26

Yu. M. Kabanov

Thus, n n n E n (ξ˜ − 1)2 = D n ξ˜ + (E n ξ˜ − 1)2 → 0

and we get a contradiction. Suppose now that in the n-th model we are given a d-dimensional square integrable price process (Stn ) where t ∈ {0, T }. In general, d = d(n). Suppose that S0in = 1 (this is just a choice of scales). The crucial hypothesis of the k-factor APM is that there are k common sources of randomness affecting the prices of all securities and there are also individual sources of randomness related to each security. Specifically, we suppose that STin = µin +

k

in ζ nj bin j +η ,

i ≤ d,

j=1

or, in vector notation, STn = µn +

k

ζ nj bnj + ηn .

j=1

Here µn , bnj ∈ Rd , the scalar random variables ζ nj with zero means are square integrable and the d-dimensional random vector ηn with zero mean has uncorrelated components (representing randomness proper to each asset). Assume that Dηin ≤ C for all i ≤ d and n ∈ N for a certain constant C. A (self-financing) portfolio strategy H n is a vector in Rd such that n

H 1d :=

d

H in = 0.

i=1

At the final date the corresponding portfolio value is VTn = H n STn =

d

H i,n STin

i=1

and these random variables form the set RT0n . Lemma 5.2 Let Ln be the linear subspace in Rd spanned by the set {1d , bnj , j ≤ k} and let cn be the projection of µn onto L⊥ n . Then NAA

⇒

sup |cn | < ∞. n

Proof Let an be a real number. The vector H n := an cn (being orthogonal to 1d ) is a self-financing strategy with the corresponding terminal value VTn = an |cn |2 + an cn ηn .

1. Arbitrage Theory

27

It follows that E n VTn = an |cn |2 , D n VTn = an2 E(cn ηn )2 = an2

d

(cin )2 D n ηin ≤ Can2 |cn |2 .

i=1

In particular, for an = |cn |−3/2 we have an asymptotic arbitrage opportunity for any subsequence along which |cn | converges to infinity. As is easily seen from the proof, the conditions of the lemma are equivalent if D n ηin ≥ ε > 0 for all i and n. Proposition 5.3 Assume that NAA holds. Then there exist a constant A and realvalued sequences {r n }, {g nj }, j ≤ k, such that k d k 2 2 n n n n in n n in − r 1 − g b := − r − g b ≤ A. µ µ d j j j j j=1

i=1

j=1

The assertion is an obvious corollary of the above lemma: the vector cn is a difference of µn and the projection of µn onto Ln ; the latter is a linear combination of the generating vectors 1d , b1n , . . . , bkn . Of course, if the generators are not linearly independent, the coefficients r n , g1n , . . . , gkn are not uniquely defined. The most interesting case of the APM is the “stationary” one where all random variables “live” on the same probability space and do not depend on n. All model parameters also do not depend on n except the dimension d = n. In other words, we are given infinite-dimensional vectors µ = (µ1 , µ2 , . . .), η = (η1 , η2 , . . .), etc., and the ingredients of the n-th model, µn , ηn , etc., are composed of the first n coordinates of these vectors. One can think that the “real-world” market has an infinite number of securities, enumerated somehow, and the agent uses the first n of them in his portfolios. That is, the increment of the n-dimensional price process in the n-th model is STi

= µi +

k

ζ j bij + ηi ,

i ≤ n.

j=1

Theorem 5.4 Assume that NAA holds. Then there are constants r and g j , j ≤ k, such that ∞ k 2 µi − r − g j bij < ∞. i=1

j=1

28

Yu. M. Kabanov

Proof Let us consider the vector space spanned by the infinite-dimensional vectors 1∞ = (1, 1, . . .), b j = (b1j , b2j , . . .), j ≤ k. Without loss of generality we may assume that 1∞ , b j , j ≤ l, is a basis in this space. There is n 0 such that for every n ≥ n 0 the vectors formed by the first n components of the latter are linearly independent. For every n ≥ n 0 we define the set n k 2 g j bij ≤ A K n := (r, g1 , . . . , gl , 0, . . . , 0) ∈ Rk+1 : µi − r − i=1

j=1 n

where choosing A as in Proposition 5.3 ensures that K is non-empty. Clearly, K n is closed and K n+1 ⊆ K n . It is easily seen that K n is bounded (otherwise we could construct a linear relation between the vectors assumed to be linearly independent). Thus, the sets K n are compact, ∩n≥n0 K n = ∅, and the result follows. In the case where the num´eraire is a traded security, say, the first one (i.e. ST1n = 0) we can take r n = 0 for all n in Proposition 5.3 and r = 0 in Theorem 5.4. To see this, we repeat the arguments above with “truncated” price vectors and strategies, the first component being excluded. In this specification an admissible strategy is just a vector from Rd−1 and the projection onto the vector with unit coordinates is not needed. To make the relation between CAPM and APM clear, let us consider the onefactor stationary model where the num´eraire is a traded security and the increments of the risky asset (enumerating from zero) are of the following structure: ST0

= µ 0 + b0 ζ ,

STi

= µi + bi ζ + ηi ,

i ≥ 1.

where all random variables ζ and ηi are uncorrelated and have zero means. Assume that Dηi ≤ C. The 0-th asset plays a particular role: all other price movements are conditionally uncorrelated given ST0 . It can be viewed as a kind of “market portfolio” or “market index”. If there is no asymptotic arbitrage, then there exists a constant g such that ∞ (µi − gbi )2 < ∞ i=0

i.e. µi = gbi + u i where u i → 0. If the residual u 0 is small, then µ0 ≈ gb0 . We can use the latter relation to specify g and conclude that µi ≈ µ0 β i (at least, for sufficiently large i) with β i := bi /b0 . Of course, this reasoning is far from being rigorous: the empirical data, even being in accordance with APM, may or may not follow the conclusion of CAPM. Note that the approach of APT is based on the assumption that the agents have certain risk-preferences and in the asymptotic setting they may accept the

1. Arbitrage Theory

29

possibility of large losses with small probabilities; the variance is taken as an appropriate measure of risk. A specific feature of the classical APT is that it does not deal with the problem of existence of equivalent martingale measures which is the key point of the Fundamental Theorem of Asset Pricing. For a long time these two arbitrage theories were considered as unrelated. In [35] an approach was suggested which puts together basic ideas of both of them and allows us to solve the long-standing problem of extension of APT to the continuous-time setting. A brief account of its further development is given in the next subsections.

5.2 Asymptotic arbitrage and contiguity The theory of large financial markets contains four principal ingredients: basic concepts, functional-analytic methods, probabilistic results, and analysis of specific models. The fundamentals of this theory were established in [35] where the definitions of asymptotic arbitrage of the first and the second kind were suggested. Assuming the uniqueness of equivalent martingale measures (i.e. the completeness) for each market model, the authors proved necessary and sufficient conditions for NAA1 and NAA2 in terms of contiguity of sequences of equivalent martingale measures and objective (“historical”) probabilities. A particular model of a “large Black–Scholes market” (where the price processes are correlated geometric Brownian motions) was investigated. It was shown that the boundedness condition similar to that of Ross–Huberman can be obtained as a direct application of the Liptser–Shiryaev criteria of contiguity in terms of the Hellinger processes. The restricting uniqueness hypothesis was removed by Klein and Schachermayer (see [45], [46], and [44]). They discovered the importance of duality methods of geometric functional analysis in the context of large financial markets and found non-trivial extensions of NAA1 and NAA2 criteria for the case of incomplete market models. These criteria were complemented in [37] by new ones. In particular, it was shown that the strong asymptotic arbitrage is equivalent to the complete asymptotic separability of the historic probabilities and equivalent martingale measures. Our presentation follows the latter paper where also several modifications of classical models were analyzed and necessary and sufficient conditions for absence of asymptotic arbitrage were obtained in terms of model specifications. In the terminology of [37], a large financial market is a sequence of ordinary semimartingale models of a frictionless market {(Bn , S n , T n )}, where Bn is a stochastic basis with the trivial initial σ -algebra. A semimartingale price process S n takes values in Rd for some d = d(n). To simplify notation we shall often omit the superscript for the time horizon.

30

Yu. M. Kabanov

We denote by Qn the set of all probability measures Q n equivalent to P n such that S n is a local martingale with respect to Q n . It is assumed that each set Qn of equivalent local martingale measures is non-empty. We define a trading strategy on (Bn , S n , T n ) as a predictable process H n with values in Rd such that the stochastic integral with respect to the semimartingale S n H n · S n is well-defined on [0, T ]. For a trading strategy H n and an initial endowment x n the value process V n = V (n, x n , H n ) := x n + H n · S n . A sequence V n realizes asymptotic arbitrage of the first kind (AA1) if (1a) Vtn ≥ 0 for all t ≤ T ; (1b) limn V0n = 0 (i.e. limn x n = 0); (1c) limn P n (VTn ≥ 1) > 0. A sequence V n realizes asymptotic arbitrage of the second kind (AA2) if (2a) Vtn ≤ 1 for all t ≤ T ; (2b) limn V0n > 0; (2c) limn P n (VTn ≥ ε) = 0 for any ε > 0. A sequence V n realizes strong asymptotic arbitrage of the first kind (SAA1) if (3a) Vtn ≥ 0 for all t ≤ T ; (3b) limn V0n = 0 (i.e. limn x n = 0); (3c) limn P n (VTn ≥ 1) = 1. One can continue and give also the definition SAA2. It is easy to understand that the existence of SAA1 implies the existence of SAA2 and vice versa (provided that there are no specific constraints). So existence criteria are the same in both cases. A large security market {(Bn , S n , T n )} has no asymptotic arbitrage of the first kind (respectively, of the second kind) if for any subsequence (m) there are no value processes V m realizing asymptotic arbitrage of the first kind (respectively, of the second kind) for {(Bm , S m , T m )}. To formulate the results we need to extend some notions from measure theory. Let Q = {Q} be a family of probabilities on a measurable space (, F). Define the upper and lower envelopes of measures from Q as the set functions with Q(A) := sup Q(A), Q∈Q

Q(A) := inf Q(A), Q∈Q

A ∈ F.

We say that Q is dominated if any element of Q is absolutely continuous with respect to some fixed probability measure. In our setting, where for every n a family Qn of equivalent local martingale n measures is given, we use the obvious notations Q and Qn .

1. Arbitrage Theory

31

Generalizing in a straightforward way the well-known notion of contiguity to set functions other than measures, we introduce the following definitions: n n The sequence (P n ) is contiguous with respect to (Q ) (notation: (P n ) $ (Q )) when the implication n

lim Q (An ) = 0

⇒

n→∞

lim P n (An ) = 0

n→∞

holds for any sequence An ∈ F n , n ≥ 1. n Obviously, (P n ) $ (Q ) if and only if the implication lim sup E Q g n = 0

n→∞ Q∈Qn

⇒

lim E P n g n = 0

n→∞

holds for any uniformly bounded sequence g n of positive F n -measurable random variables. n n A sequence (P n ) is asymptotically separable from (Q ) (notation: (P n ) % (Q )) if there exists a subsequence (m) with sets Am ∈ F m such that m

lim Q (Am ) = 0,

m→∞

lim P m (Am ) = 1.

m→∞

Proposition 5.5 The following conditions are equivalent: (a) there is no asymptotic arbitrage of the first kind (NAA1); n (b) (P n ) $ (Q ); (c) there exists a sequence R n ∈ Qn such that (P n ) $ (R n ). Proof (b) ⇒ (a) Let (V n ) be a sequence of value processes realizing asymptotic arbitrage of the first kind. For any Q ∈ Qn the process V n is a non-negative local Q-martingale, hence a Q-supermartingale, and sup E Q VTn ≤ sup E Q V0n = x n → 0

Q∈Qn

Q∈Qn

by (1b). Thus, n

Q (VTn ≥ 1) := sup Q(VTn ≥ 1) → 0 Q∈Qn

n

and, by contiguity (P n ) $ (Q ), we have P n (VTn ≥ 1) → 0 in contradiction to (1c). n (a) ⇒ (b) Assume that (P n ) is not contiguous with respect to (Q ). Taking, n if necessary, a subsequence we can find sets n ∈ F n such that Q ( n ) → 0, P n ( n ) → γ as n → ∞ where γ > 0. According to Proposition 4.7 the process X tn = ess sup Q∈Qn E Q (I n |Ftn ) is a supermartingale with respect to any Q ∈ Qn . By Theorem 4.6 it admits a decomposition X n = X 0n + H n · S n − An where An is an increasing process. Let

32

Yu. M. Kabanov

us show that V n := X 0n + H n · S n are value processes realizing AA1. Indeed, V n = X n + An ≥ 0, n

V0n = sup E Q I n = Q ( n ) → 0, Q∈Qn

and lim P n (VTn ≥ 1) ≥ lim P n (X Tn ≥ 1) = lim P n (X Tn = 1) = lim P n ( n ) = γ > 0. n

n

n

n

(b) ⇔ (c) This relation follows from the convexity of Qn and a general result given below. Proposition 5.6 Assume that for any n ≥ 1 we are given a probability space (n , F n , P n ) with a dominated family Qn of probability measures. Then the following conditions are equivalent: n

(a) (P n ) $ (Q ); (b) there is a sequence R n ∈ conv Qn such that (P n ) $ (R n ); (c) the following equality holds: lim lim inf

sup

α↓0 n→∞ Q∈conv Qn

H (α, Q, P n ) = 1,

where H (α, Q, P) = (d Q)α (d P)1−α is the Hellinger integral of order α ∈ ]0, 1[. The sequence of sets of probability measures (Qn ) is said to be weakly contiguous with respect to (P n ) (notation: (Qn ) $w (P n )) if for any ε > 0 there are δ > 0 and a sequence of measures Q n ∈ Qn such that for any sequence An ∈ F n with the property lim supn P n (An ) < δ we have lim supn Q n (An ) < ε. For the case where the sets Qn are singletons containing only the measure Q n , the relation (Qn ) $w (P n ) means simply that (Q n ) $ (P n ). Obviously, the property (Qn ) $w (P n ) can be formulated in terms of random variables: for any ε > 0 there are δ > 0 and a sequence of measures Q n ∈ Qn such that for any sequence of F n -measurable random variables g n taking values in the interval [0, 1] with the property lim supn E P n g n < δ, we have lim supn E Q n g n < ε. Proposition 5.7 The following conditions are equivalent: (a) there is no asymptotic arbitrage of the second kind (NAA2); (b) (Qn ) $ (P n ); (c) (Qn ) $w (P n ).

1. Arbitrage Theory

33

The proof of Proposition 5.7 is similar to that of Proposition 5.5. Notice that the conditions (b) in both statements look rather symmetric in contrast to the conditions (c). In general, the condition (b) of Proposition 5.7 may hold though a sequence Q n ∈ Qn such that (Q n ) $ (P n ) does not exist (see an example in [45]). The reason is that the set functions Q and Q are of a radically different nature. The following assertion gives criteria of existence of strong asymptotic arbitrage. Proposition 5.8 The following conditions are equivalent: (a) (b) (c) (d)

there is SAA1; n (P n ) % (Q ); (Qn ) % (P n ); (P n ) % (Q n ) for any sequence Q n ∈ Qn .

Let P and P˜ be two equivalent probability measures on a stochastic basis B and ˜ let R := (P + P)/2. Let us denote by z and z˜ the density processes of P and ˜ P with respect to R. For arbitrary α ∈ ]0, 1[ the process Y = Y (α) := z α z˜ 1−α is a R-supermartingale admitting the multiplicative decomposition Y = ME(−h) where M = M(α) is a local Q-martingale, E is the Dol´ean–Dade exponential, and ˜ is an increasing predictable process, h 0 = 0, called the Hellinger h = h(α, P, P) process of order α. These Hellinger processes play an important role in criteria of absolute continuity and, more generally, contiguity of probability measures, see [28] for details. In the abstract setting of Proposition 5.6 when the probability spaces are equipped with filtrations (i.e. they are stochastic bases) we have the following results which are helpful in analysis of particular models arising in mathematical finance. Theorem 5.9 The following conditions are equivalent: n

(a) (P n ) $ (Q ); (b) for all ε > 0 lim lim sup α↓0

n→∞

inf

Q∈conv Qn

P n (h ∞ (α, Q, P n ) ≥ ε) = 0.

Theorem 5.10 Assume that the family Qn is convex and dominated for any n. Then the following conditions are equivalent: (a) (Qn ) $ (P n ); (b) for all ε > 0 lim lim sup inf n Q(h ∞ (α, P n , Q) ≥ ε) = 0. α↓0

n→∞

Q∈Q

34

Yu. M. Kabanov

The concept of contiguity is useful in relation with an important question whether the option prices calculated in “approximating” models converge to the “true” option price, see [24] and [58]. 5.3 A large BS-market Let (, F, F = (Ft ), P) be a stochastic basis with a countable set of independent one-dimensional Wiener processes w i , i ∈ Z+ , wn = (w0 , . . . , w n ), and let Fn = (Ftn ) be a filtration generated by wn . For simplicity, assume that T is fixed. The behavior of the stock prices is described by the following stochastic differential equations: d X t0 = µ0 X t0 dt + σ 0 X t0 dwt0 , d X ti = µi X ti dt + σ i X ti (γ i dwt0 + γ¯ i dwti ),

i ∈ N,

with (deterministic strictly positive) initial points X 0i . Here γ i is a function taking values in [0, 1[ and γ i2 + γ¯ i2 = 1, We assume that µi , σ i ∈ L 2 [0, T ] and σ i > 0. Notice that the process ξ i with dξ it = γ i dwt0 + γ¯ i dwti ,

ξ i0 = 0,

is a Wiener process. Thus, in the case of constant coefficients price processes are geometric Brownian motions as in the classical case of Black and Scholes. The model is designed to reflect the fact that in the market there are two different types of randomness: the first type is proper to each stock while the second one originates from some common source and it is accumulated in a “stock index” (or “market portfolio”) whose evolution is described by the first equation. Set γ σi γ σiσ0 β i := i = i 2 . σ0 σ0 In the case of deterministic coefficients, β i is a well-known measure of risk which is the covariance between the return on the asset with number i and the return on the index, divided by the variance of the return on the index. Let bn (t) := (b0 (t), b1 (t), . . . , bn (t)) where b0 := − Assume that for every n

µ0 , σ0

bi :=

β i µ0 − µi . σ i γ¯ i

T

|bn (t)|2 dt < ∞. 0

We consider the stochastic basis Bn = (, F, Fn = (Ftn )t≤T , P n ) with the (n + 1)-dimensional semimartingale S n := (X t0 , X t1 , . . . , X tn ) and P n := P|FTn . The

1. Arbitrage Theory

35

sequence {(Bn , S n , T )} is a large security market. In our case each (Bn , S n , T ) is a model of a complete market and the set Qn is a singleton which consists of the measure Q n = Z T (bn )P n where T

1 T n 2 (bn (t), dwt ) − |bn (t)| dt . Z T (bn ) := exp 2 0 0 The Hellinger process has an explicit expression 2 T 2 n µ α(1 − α) µ − β µ 0 i 0 i ds. h(α, Q n , P n ) = + 2 σ0 σ i γ¯ i 0 i=1 As a corollary of Theorem 5.9 we have Proposition 5.11 The condition NAA1 holds if and only if T 2 ∞ µ0 µi − β i µ0 2 ds < ∞. + σ0 σ i γ¯ i 0 i=1 In fact, in this model both conditions NAA1 and NAA2 hold simultaneously. In the particular case of constant coefficients, finite T , and 0 < c ≤ σ i γ¯ i ≤ C we get that the property NAA1 holds if and only if ∞

(µi − β i µ0 )2 < ∞,

i=1

i.e. the Huberman–Ross boundedness is fulfilled. 5.4 One-factor APM revisited We consider the “stationary” one-factor model of the following specific structure (cf. with the model given at the end of Subsection 5.1). Let (! i )i≥0 be independent random variables given on a probability space (, F, P) and taking values in a finite interval [−N , N ], E! i = 0, E! i2 = 1. At time zero all asset prices S0i = 1 and ST0

= 1 + µ0 + σ 0 ! 0 ,

STi

= 1 + µi + σ i (γ i ! 0 + γ¯ i ! i ),

i ≥ 1.

The coefficients here are deterministic, σ i > 0, γ¯ i > 0 and γ i2 + γ¯ i2 = 1. The asset with number zero is interpreted as a market portfolio, γ i is the correlation coefficient between the rate of return for the market portfolio and the rate of return for the asset with number i. For n ≥ 0 we consider the stochastic basis Bn = (, F n , Fn = (Ftn )t∈{0,1} , P n ) with the (n + 1)-dimensional random process Sn := (St0 , St1 , . . . , Stn )t∈{0,1} where

36

Yu. M. Kabanov

F0n is the trivial σ -algebra, F1n = F n := σ {! 0 , . . . , ! n }, and P n = P|F n . According to our definition, the sequence M = {(Bn , Sn , 1)} is a large security market. Let β i := γ i σ i /σ 0 , b0 := −

µ0 , σ0

bi :=

µ0 β i − µi , σ i γ¯ i

i ≥ 1.

It is convenient to rewrite the price increments as follows: ST0

= 1 + σ 0 (! 0 − b0 ),

STi

= 1 + σ i γ i (! 0 − b0 ) + σ i γ¯ i (! i − bi )),

i ≥ 1.

The set Qn of equivalent martingale measures for Sn has a very simple description: Q ∈ Qn iff Q ∼ P n and E Q (! i − bi ) = 0,

0 ≤ i ≤ n,

i.e. the bi are mean values of ! i under Q. Obviously, Qn = ∅ iff P(! i > bi ) > 0 and P(! i < bi ) > 0 for all i ≤ n. As usual, we assume that Qn = ∅ for all n; this implies, in particular, that |bi | < N . Let Fi be the distribution function of ! i . Put s i := inf{t : Fi (t) > 0},

s¯i := inf{t : Fi (t) = 1},

d i := bi − s i , d¯i := s¯i − bi , and di := d i ∧ d¯i . In other words, di is the distance from bi to the end points of the interval [s i , s¯i ]. Proposition 5.12 The following assertions hold: n

(a) infi di = 0 ⇔ SAA ⇔ (P n ) % (Q ), n (b) infi di > 0 ⇔ NAA1 ⇔ (P n ) $ (Q ), (c) lim supi |bi | = 0 ⇔ NAA2 ⇔ (Qn ) $ (P n ). The hypothesis that the distributions of ! i have finite support is important: it excludes the case where the value of every non-trivial portfolio is negative with positive probability. For the proof of this result, we send the reader to the original paper [37].

Appendix: Facts from convex analysis 1 By definition, a subset K in Rn (or in a linear space X ) is a cone if it is convex and stable under multiplication by the non-negative constants. It defines the partial ordering: x ≥K y

⇔

x − y ∈ K;

1. Arbitrage Theory

37

in particular, x ≥ K 0 means that x ∈ K . A closed cone K is proper if the linear space F := K ∩ (−K ) = {0}, i.e. if the relations x ≥ K and x ≤ K = 0 imply that x = 0. Let K be a closed cone and let π : Rn → Rn /F be the canonical mapping onto the quotient space. Then π K is a proper closed cone. For a set C we denote by cone C the set of all conic combinations of elements of C. If C is convex then cone C = ∪λ≥0 λC. Let K be a cone. Its dual positive cone K ∗ := {z ∈ Rn : zx ≥ 0 ∀x ∈ K } is closed (the dual cone K ◦ is defined using the opposite inequality, i.e. K ◦ = −K ∗ ); K is closed if and only if K = K ∗∗ . We use the notations int K for the interior of K and ri K for the relative interior (i.e. the interior in K − K , the linear subspace generated by K ). A closed cone K in the Euclidean space Rn is proper if and only if there exists a compact convex set C such that 0 ∈ / C and K = cone C. One can take as C the convex hull of the intersection of K with the unit sphere {x ∈ Rn : |x| = 1}. A closed cone K is proper if and only if int K ∗ = ∅. We have ri K ∗ = {w : wx > 0 ∀x ∈ K , x = F}; in particular, if K is proper then int K ∗ = {w : wx > 0 ∀x ∈ K , x = 0}. By definition, the cone K is polyhedral if it is the intersection of a finite number of half-spaces {x : pi x ≥ 0}, pi ∈ Rn , i = 1, . . . , N . The Farkas–Minkowski–Weyl theorem: a cone is polyhedral if and only if it is finitely generated. The following result is a direct generalization of the Stiemke lemma. Lemma A.1 Let K and R be closed cones in Rn . Assume that K is proper. Then R ∩ K = {0}

⇔

(−R ∗ ) ∩ int K ∗ = ∅.

Proof (⇐) The existence of w such that wx ≤ 0 for all x ∈ R and wy > 0 for all y in K \ {0} obviously implies that R and K \ {0} are disjoint. (⇒) Let C be a convex compact set such that 0 ∈ / C and K = cone C. By the separation theorem (for the case where one set is closed and another is compact)

38

Yu. M. Kabanov

there is a non-zero z ∈ Rn such that sup zx < inf zy. y∈C

x∈R

Since R is a cone, the left-hand side of this inequality is zero, hence z ∈ −R ∗ and, also, zy > 0 for all y ∈ C. The latter property implies that zy > 0 for z ∈ K , z = 0, and we have z ∈ int K . In the classical Stiemke lemma K = Rn+ and R = {y ∈ Rn : y = Bx, x ∈ Rd } where B is a linear mapping. Usually, it is formulated as the alternative: either there is x ∈ Rd such that Bx ≥ K 0 and Bx = 0 or there is y ∈ Rn with strictly positive components such that B ∗ y = 0. Lemma A.1 can be slightly generalized. Let J1 be the natural projection of Rn onto Rn /F. Theorem A.2 Let K and R be closed cones in Rn . Assume that the cone π R is closed. Then R∩K ⊆F

⇔

(−R ∗ ) ∩ ri K ∗ = ∅.

Proof It is easy to see that π(R ∩ K ) = π R ∩ π K and, hence, R∩K ⊆F

⇔

π R ∩ π K = {0}.

By Lemma A.1 π R ∩ π K = {0}

⇔

(−π R)∗ ∩ int (π K )∗ = ∅.

Since (π R)∗ = π ∗−1 R ∗ and int (π K )∗ = π ∗−1 (ri K ∗ ), the condition in the righthand side can be written as π ∗−1 ((−R ∗ ) ∩ ri K ∗ ) = ∅ or, equivalently, (−R ∗ ) ∩ ri K ∗ ∩ Im π ∗ = ∅. But Im π ∗ = (K ∩ (−K ))∗ = K ∗ − K ∗ ⊇ ri K ∗ and we get the result. Notice that if R is polyhedral then π R is also polyhedral, hence closed. 2 The following result is referred to as the Kreps–Yan theorem, see [48], [63], [5]. It holds for arbitrary p ∈ [1, ∞], p−1 + q −1 = 1, but the cases p = 1 and p = ∞ are the most important.

1. Arbitrage Theory

39 p

Theorem A.3 Let C be a convex cone in L p closed in σ {L p , L q }, containing −L + p ˜ P ∈ L q such that and such that C ∩ L + = {0}. Then there is a P˜ ∼ P with d P/d ˜ ≤ 0 for all ξ ∈ C. Eξ p

Proof By the Hahn–Banach theorem any non-zero x ∈ L + := L p (R+ , F) can be separated from C: there is a z x ∈ L q such that E z x x > 0 and E z x ξ ≤ 0 p for all ξ ∈ C. Since C ⊇ −L + , the latter property yields that z x ≥ 0; we may assume ||z x ||q = 1. By the Halmos–Savage lemma the dominated family {Px = p z x P : x ∈ L + , x = 0} contains a countable equivalent family {Pxi }. But then −i z := 2 z xi > 0 and we can take P˜ := z P. Recall that the Halmos–Savage lemma, though important, is, in fact, very simple. It suffices to prove its claim for the case of a convex family (in our situation we even have this property). A family {Pxi } such that the sequence I{z xi >0} increases to ess sup I{z x >0} (existing because of convexity) meets the requirement. The above theorem has the following “purely geometric” version, [5]. Theorem A.4 Suppose J and K are non-empty convex cones in a separable Banach space X such that J ∩ K − J = {0}. Then there is a continuous linear functional z such that zx > 0 ∀ x ∈ J and zx ≤ 0 ∀ x ∈ K . The first step of the proof is the same as of the previous theorem: the separation of single points allows us to construct the set of {z x ∈ X , x ∈ K } with unit norms. The second step is to select a countable weak∗ dense subset. This can be done because the separability of X implies that the weak∗ -topology on the unit ball of X (always weak∗ compact) is metrizable. For the Lebesgue spaces the separability means that the σ -algebra is countably generated. Specific properties of these spaces allow us, by means of the Halmos–Savage lemma, to avoid such an unpleasant assumption on the σ -algebra. References [1] Ansel, J.-P. and Stricker, C. (1994), Couverture des actifs contingents. Ann. Inst. Henri Poincar´e 30, 2, 303–15. [2] Bouchard-Denize, B. and Touzi, N. (2001), Explicit solution of the multivariate super-replication problem under transaction costs. Preprint. [3] Chamberlain, G. (1983), Funds, factors, and diversification in arbitrage pricing models. Econometrica 51, 5, 1305–23. [4] Chamberlain, G. and Rothschild, M. (1983), Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51, 5, 1281–304. [5] Clark, S.A. (1992), The valuation problem in arbitrage price theory. J. Math. Economics 22, 463–78. [6] Cvitani´c, J. and Karatzas, I. (1996), Hedging and portfolio optimization under transaction costs: a martingale approach. Mathematical Finance 6, 2, 133–65.

40

Yu. M. Kabanov

[7] Cvitani´c, J., Pham, H. and Touzi, N. (1999), A closed form solution to the problem of super-replication under transaction costs. Finance and Stochastics 3, 1, 35–54. [8] Dalang, R.C., Morton, A. and Willinger, W. (1990), Equivalent martingale measures and no-arbitrage in stochastic securities market model. Stochastics and Stochastic Reports 29, 185–201. [9] Davis, M.H.A. and Clark, J.M.C. (1994), A note on super-replicating strategies. Philos. Trans. Roy. Soc. London A 347, 485–94. [10] Delbaen, F. (1992), Representing martingale measures when asset prices are continuous and bounded. Mathematical Finance 2, 107–30. [11] Delbaen, F., Kabanov, Yu.M and Valkeila, S. (2001), Hedging under transaction costs in currency markets: a discrete-time model. Mathematical Finance. To appear. [12] Delbaen, F. and Schachermayer, W. (1994), A general version of the fundamental theorem of asset pricing. Math. Annalen 300, 463–520. [13] Delbaen, F. and Schachermayer, W. (1999), A compactness principle for bounded sequence of martingales with applications. Proceedings of the Seminar of Stochastic Analysis, Random Fields and Applications, 1999. [14] Delbaen, F. and Schachermayer, W. (1998), The fundamental theorem of asset pricing for unbounded stochastic processes. Math. Annalen 312, 215–50. [15] Dellacherie, C. and Meyer, P.-A. Probabilit´es et Potenciel. Hermann, Paris, 1980. [16] El Karoui, N. and Quenez, M.-C. (1995), Dynamic programming and pricing of contingent claims in an incomplete market. SIAM Journal on Control and Optimization 33, 1, 27–66. ´ [17] Emery, M. (1979), Une topologie sur l’espace de semimartingales. S´eminaire de Probabilit´es XIII. Lect. Notes Math., 721, 260–80. [18] F¨ollmer, H. and Kabanov, Yu.M. (1998), Optional decomposition and Lagrange multipliers. Finance and Stochastics 2, 1, 69–81. [19] F¨ollmer, H. and Kabanov, Yu.M. (1996), Optional decomposition theorems in discrete time. Atti del convegno in onore di Oliviero Lessi, Padova, 25–26 marzo 1996, 47–68. [20] F¨ollmer, H. and Kramkov, D.O. (1997), Optional decomposition theorem under constraints. Probability Theory and Related Fields 109, 1, 1–25. ¨ [21] Gordan, P. (1873), Uber di Aufl¨osung linearer Gleichungen mit reelen Koefficienten. Math. Annalen 6, 23–8. [22] Hall, P. and Heyde, C.C. Martingale Limit Theory and Its Applications. Academic Press, New York, 1980. [23] Harrison, M. and Pliska, S. (1981), Martingales and stochastic integrals in the theory of continuous trading. Stochastic Processes and their Applications 11, 215–60. [24] Hubalek, F. and Schachermayer, W. (1998), When does convergence of asset price processes imply convergence of option prices? Mathematical Finance 8, 4, 215–33. [25] Huberman, G. (1982), A simple approach to arbitrage pricing theory. Journal of Economic Theory 28, 1, 183–91. [26] Ingersoll, J.E., Jr. (1984), Some results in the theory of arbitrage pricing. Journal of Finance 39, 1021–39. [27] Ingersoll, J.E., Jr. Theory of Financial Decision Making. Rowman and Littlefield, 1989. [28] Jacod, J. and Shiryaev, A.N. Limit Theorems for Stochastic Processes. Springer, Berlin–Heidelberg–New York, 1987. [29] Jacod, J. and Shiryaev, A.N. (1998), Local martingales and the fundamental asset pricing theorem in the discrete-time case. Finance and Stochastics 2, 3, 259–73. [30] Jouini, E. and Kallal, H. (1995), Martingales and arbitrage in securities markets with

1. Arbitrage Theory

41

transaction costs. J. Economic Theory 66, 178–97. [31] Jouini, E. and Kallal, H. (1995), Arbitrage in securities markets with short sale constraints. Mathematical Finance 5, 3, 197–232. [32] Jouini, E. and Kallal, H. (1999), Viability and equilibrium in securities markets with frictions. Mathematical Finance 9, 3, 275–92. [33] Kabanov, Yu.M. On the FTAP of Kreps–Delbaen–Schachermayer. Statistics and Control of Random Processes. The Liptser Festschrift. Proceedings of Steklov Mathematical Institute Seminar, World Scientific, 1997, 191–203. [34] Kabanov, Yu.M. (1999), Hedging and liquidation under transaction costs in currency markets. Finance and Stochastics 3, 2, 237–48. [35] Kabanov, Yu. M. and Kramkov, D.O. (1994), Large financial markets: asymptotic arbitrage and contiguity. Probability Theory and its Applications 39, 1, 222–9. [36] Kabanov, Yu.M. and Kramkov, D.O. (1994), No-arbitrage and equivalent martingale measure: an elementary proof of the Harrison–Pliska theorem. Probability Theory and Its Applications, 39, 3, 523–7. [37] Kabanov, Yu.M. and Kramkov, D.O. (1998), Asymptotic arbitrage in large financial markets. Finance and Stochastics 2, 2, 143–72. [38] Kabanov, Yu.M. and Last, G. (2001a), Hedging in a model with transaction costs. Preprint. [39] Kabanov, Yu.M. and Last, G. (2001b), Hedging under transaction costs in currency markets: a continuous-time model. Mathematical Finance. To appear. [40] Kabanov, Yu.M., Liptser, R.Sh. and Shiryayev, A.N. (1981), On the variation distance for probability measures defined on a filtered space. Probability Theory and Related Fields 71, 19–36. [41] Kabanov, Yu.M. and Stricker, Ch. (2001a), The Harrison–Pliska arbitrage pricing theorem under transaction costs. J. Math. Econ. To appear. [42] Kabanov, Yu.M. and Stricker Ch. (2001b), A teachers’ note on no-arbitrage criteria. S´eminaire de Probabilit´es. To appear. [43] Kabanov, Yu.M., Stricker, Ch. (2001c), On equivalent martingale measures with bounded densities. S´eminaire de Probabilit´e. To appear. [44] Klein, I. (2001), A fundamental theorem of asset pricing for large financial markets. Preprint. [45] Klein, I. and Schachermayer, W. (1996), Asymptotic arbitrage in non-complete large financial markets. Probability Theory and its Applications 41, 4, 927–34. [46] Klein, I. and Schachermayer, W. (1996), A quantitative and a dual version of the Halmos–Savage theorem with applications to mathematical finance. Annals of Probability 24, 2, 867–81. [47] Kramkov, D.O. (1996), Optional decomposition of supermartingales and hedging in incomplete security markets. Probability Theory and Related Fields 105, 4, 459–79. [48] Kreps, D.M. (1981), Arbitrage and equilibrium in economies with infinitely many commodities. J. Math. Economics 8, 15–35. [49] Levental, S. and Skorohod, A.V. (1997), On the possibility of hedging options in the presence of transaction costs. The Annals of Applied Probability 7, 410–43. [50] M´emin, J. (1980), Espace de semimartingales et changement de probabilit´e. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und Verw. Geb., 52, 9–39. [51] Pshenychnyi, B.N. Convex Analysis and Extremal Problems. Nauka, Moscow, 1980 (in Russian). [52] Rockafellar, R.T. Convex Analysis. Princeton University Press, Princeton, 1970. [53] Rogers, L.C.G. (1994), Equivalent martingale measures and no-arbitrage. Stochastic and Stochastics Reports 51, 41–51.

42

Yu. M. Kabanov

[54] Ross, S.A. (1976), The arbitrage theory of asset pricing. Journal of Economic Theory 13, 1, 341–60. [55] Sin, C.A. Strictly local martingales and hedge ratios on stochastic volatility models. PhD-dissertation, Cornell University, 1996. [56] Schachermayer, W. (1992), A Hilbert space proof of the fundamental theorem of asset pricing in finite discrete time. Insurance: Mathematics and Economics 11, 249–57. [57] Shiryaev, A.N. Probability. Springer, Berlin–Heidelberg–New York, 1984. [58] Shiryaev, A.N. Essentials of Stochastic Finance. World Scientific, Singapore, 1999. [59] Soner, H.M., Shreve, S.E. and Cvitani´c, J. (1995), There is no non-trivial hedging portfolio for option pricing with transaction costs. The Annals of Applied Probability 5, 327–55. [60] Stricker, Ch. (1990), Arbitrage et lois de martingale. Annales de l’Institut Henri Poincar´e. Probabilit´e et Statistiques 26, 3, 451–60. [61] Schrijver, A. Theory of Linear and Integer Programming. Wiley, 1986. ¨ [62] Stiemke, E. (1915), Uber positive L¨osungen homogener linearer Gleichungen. Math. Annalen 76, 340–2. [63] Yan, J.A. (1980), Caract´erisation d’une classe d’ensembles convexes de L 1 et H 1 . S´eminaire de Probabilit´es XIV. Lect. Notes Math., 784, 260–80.

2 Market Models with Frictions: Arbitrage and Pricing Issues Ely`es Jouini and Clotilde Napp

1 Introduction The Fundamental Theorem of Asset Pricing, which originates in the Arrow– Debreu model (Debreu (1959)) and is further formalized in (among others) Harrison and Kreps (1979), Kreps (1981), Harrison and Pliska (1981), Duffie and Huang (1986), Dybvig and Ross (1987), Dalang, Morton and Willinger (1989), Back and Pliska (1990), Stricker (1990), Delbaen (1992), Lakner (1993) and Delbaen and Schachermayer (1994, 1998), asserts that the absence of free lunch in a frictionless (and complete) securities market model is equivalent to the existence of an equivalent martingale measure for the normalized securities price processes. The only arbitrage free and viable pricing rule on the set of contingent claims, which is a linear space, is then equal to the expected value with respect to the unique equivalent martingale measure. In this chapter, we study some foundational issues in the theory of asset pricing in general models with flows as well as in securities market models with frictions. We consider financial models, where any investment opportunity is described by the cash flow that it generates. For instance, in such models, the investment opportunity, which consists, in a perfect financial model, of buying at time t1 one unit of a risky asset, whose price process is given by (St )t≥0 , and selling at time t2 with t1 ≤ t2 the unit bought, is described by the process ("t )t≥0 which is null outside {t1 , t2 } and which satisfies "t1 = −St1 and "t2 = St2 . Sections 2 and 3 deal with a convex cone framework, i.e. a framework where the set of all available investments consists of a convex cone. A large class of imperfect market models, that we shall denote by I, can fit in this framework: models with imperfections on the num´eraire like no borrowing or different borrowing and lending rates, models with dividends, short-sale constraints, convex cone constraints, proportional transaction costs. 43

44

E. Jouini and C. Napp

Section 2 is devoted to the characterization of the no-free-lunch assumption first in a general convex cone framework with flows, then in all the models with imperfections belonging to I, and is taken from Jouini and Napp (2001) and Napp (2000). We consider first a quite general model; the investment opportunities are not specifically related to the buying and selling of securities on a financial market. The time horizon is not supposed to be finite. The framework is the one of continuous time. We don’t assume that there exists a num´eraire, enabling investors to transfer money from one date to another, and even if such possibilities exist, we do not assume that the lending rate is equal to the borrowing rate or that we have possibilities to borrow. It is proved that the absence of free lunch in a general convex cone framework with flows is essentially equivalent to the existence of a discount process such that the “net present value” of any investment opportunity is nonpositive. This result is then applied to obtain the Fundamental Theorem of Asset Pricing for all cases of market imperfections in I. In each case, we find that there is no free lunch if and only if a given specific convex set of discount processes is nonempty. For instance, in the case with short-sale constraints, we find that the absence of free lunch is equivalent to the existence of a discount process such that the discounted price process of any security that cannot be sold short (resp. that can only be sold short) is a supermartingale (resp. a submartingale). Section 3 is devoted to pricing issues first in a general convex cone framework with flows, then in all the models with imperfections belonging to I, and is taken from Napp (2000). Section 3.1 is in the spirit of Harrison and Kreps (1979); we generalize existing results by considering general investment flows, and by taking almost any kind of imperfection into account. We consider a “primitive” market, consisting of a certain set of investment opportunities and we want to give a price to an additional contingent flow by using arbitrage considerations. More precisely, we define an admissible price for an additional contingent flow " as a price which is compatible with the assumption of no-arbitrage (or no free lunch) in the “full” market consisting of our primitive market and ". For a general contingent flow, we obtain an interval of admissible prices, which is given by the “net present value” of the flow under all admissible discount processes. We then apply this result to obtain arbitrage intervals for the price of contingent claims in market models with frictions in I. Section 3.2 is devoted to the characterization of the obtained arbitrage bounds in terms of superreplication cost. We start by defining in a general model with flows the so-called superreplication cost, which essentially corresponds to the minimum initial wealth needed to cover all future contingent flows. We show that for any contingent flow, it is equal to the upper bound of the arbitrage interval. The notion of superreplication cost was first introduced by Kreps (1981), for classical contingent claims and in the context of incomplete markets (with no

2. Arbitrage Pricing with Frictions

45

other imperfection). In a diffusion framework, and still with no other imperfection than incompleteness, El Karoui and Quenez (1995) obtain a dual formulation for the superreplication price; in Delbaen (1992) and Delbaen and Schachermayer (1994), this result is obtained in a more general framework. In the spirit of Kreps (1981), Jouini and Kallal (1995a,b) take into account the cases of proportional transaction costs and short sale constraints. For transaction costs, the problem was first introduced by Bensa¨ıd et al. (1992), who show that in a binomial model with transaction costs, perfect replication is not optimal. Cvitani´c and Karatzas (1996) give, in a diffusion framework, a dual formulation for the superreplication price. Delbaen, Kabanov and Valkeila (2001) and Kabanov (1999) generalize this result to the multivariate case, in discrete as well as in continuous time, and with a semimartingale price process. For convex constraints, and still in a diffusion framework, the dual formulation is obtained in Cvitani´c and Karatzas (1993). In a more general framework, the result is obtained in F¨ollmer and Kramkov (1997). Section 4 deals with economies with fixed transactions costs, which do not fall in the preceding framework, since the set of all available investments is not a convex cone. It is adapted from Jouini, Kallal and Napp (2000). We first obtain a characterization of the no-free-lunch assumption in a general model with flows. We find that the assumption of no-arbitrage is essentially equivalent to the existence of a family of nonnegative “discount processes” such that the net present value of any available investment is nonnegative. Then we apply this result to a securities market model where investors are submitted to both fixed and proportional transaction costs. In that case, the nonnegative discount processes can be interpreted as absolutely continuous martingale measures. Finally, we study pricing issues in securities market models with fixed transaction costs. We adopt an axiomatic approach. We define admissible pricing rules on the set of attainable contingent claims as the price functionals that are arbitrage free and are lower than or equal to the superreplication cost. Indeed, no rational agent would pay more than its superreplication cost for a contingent claim since there is a cheaper way to achieve at least the same payoff using a trading strategy. We then show that the only admissible pricing rules on the set of attainable contingent claims are those that are equal to the sum of an expected value with respect to any absolutely continuous martingale measure and of a bounded fixed cost functional.

2 The Fundamental Theorem of Asset Pricing We start by describing our general model with flows in a convex cone framework, and in such a model, we characterize the assumption of no free lunch. Then we apply this result to all cases of market imperfections belonging to the class I.

46

E. Jouini and C. Napp

2.1 In a general “convex cone model” with flows We adopt the framework of Jouini and Napp (2001), Napp (2000) or Jouini et al. (2000, Section 1). We introduce a few notations. For a filtered probability space , F, (Ft )t≥0 , P , define the measure space ˆ µ ˆ F, ˆ is the , ˆ as the direct sum of the probability spaces (, Ft , P), i.e. disjoint union of continuum many copies (t )t≥0 of , Fˆ is the sigma-algebra ˆ ˆ ˆ of sets ˆ induces on such that A ∩ t ∈ Ft , for each t ≥ 0, and µ A ⊆ ˆ t the original probability measure P. We then may represent the each t , F| ˆ µ ˆ F, Banach space X ≡ L 1 , ˆ as the space of all families " = ("t )t≥0 such that "t ∈ L 1 (, Ft , P) and -"- L 1 (, -"t - L 1 (,Ft ,P) < ∞. ˆ µ) ˆ F, ˆ = t≥0

The finiteness of the above sum implies in particular that "t = 0 for all but countably many t in R+ . The dual space of X may be represented as Y ≡ ∞ ˆ ˆ L , F, µ ˆ , which is defined as the space of all families g = (gt )t≥0 such that gt ∈ L ∞ (, Ft , P) and -g- L ∞ (, ˆ µ) ˆ F, ˆ = sup -gt - L ∞ (,Ft ,P) < ∞. t≥0

The scalar product is defined by .", g/ X,Y = t≥0 ."t , gt /. Elements of X and Y are defined up to a modification. ˆ 0 , Fˆ0 , µ ˆ 0 , Fˆ0 , µ ˆ 0 , where ˆ 0 is the direct sum of the probability Let Xˇ ≡ L 1 ˆ 0 , Fˆ0 , µ ˆ 0 denotes the dual space of Xˇ . spaces (, (Ft )t>0 , P). Then Yˇ ≡ L ∞ For x, y ∈ X or Y (resp. Xˇ or Yˇ ), we write x ≥ y if for all t ≥ 0 (resp. t > 0), x t ≥ yt a.s. P. For all subset Z of X, Y, Xˇ or Yˇ , we denote by Z + (resp. Z − ) the set of x ∈ Z such that x ≥ 0 (resp. x ≤ 0). We consider a model in which agents face investment opportunities described by their cash flows. A probability space (, F, P) is specified and fixed. The set represents all possible states of the world. An information structure, which describes how information is revealed to investors, is given by a filtration (Ft )t≥0 satisfying the “usual conditions” and such that F0 = {∅, }. We consider investments of the following form: Definition 2.1 An investment is a process " = ("t )t≥0 ∈ X . For each t ≥ 0, the random variable "t corresponds to the cash flow generated at time t by the investment "; if "t (ω) = k, this means that the investor receives k at date t if k is nonnegative and pays −k at date t if k is nonpositive. An arbitrage opportunity is as usual a possibility to find an investment that yields a positive gain in some circumstances without a countervailing threat of loss in

2. Arbitrage Pricing with Frictions

47

other circumstances. In our framework, an arbitrage opportunity would consist of a nonnegative nonnull available investment. We consider a convex cone J of available investments: this amounts to saying that an investor has a right to subscribe to (a finite number of) different investment plans and that he can decide at the starting date of any investment opportunity which amount of this particular investment he wants to buy. We are led to consider convex cones in order to take into account the fact that investors are not necessarily able to sell an investment plan (consider for instance the case of short sale constraints or transaction costs). In order to obtain the Fundamental Theorem of Asset Pricing in this context, we make the additional assumption that there is in the convex cone J some possibility of transferring some money. More precisely, we introduce the following assumption. Assumption A: there exists a sequence d = (dn )n≥0 such that for all t ∗ ≥ 0, for all Bt ∗ in Ft ∗ of positive probability, there exists " in J such that "t = 0 ∀t < t ∗ , "t ∗ = 0 outside Bt ∗ , "t ≥ 0 ∀t > t ∗ and ∃dn ∈ d, P "dn > 0 > 0. In words, this means that there exists a sequence of trading dates such that, for every date and for every event at that date, there exists an investment plan in our set of available investments that starts at that date and in that event, that can take any value at that date and in that event, but that is nonnegative after that date and nonnull at one date belonging to the above mentioned sequence of dates. This assumption is not too restrictive. See Jouini and Napp (2001) for more details on this assumption. We don’t specify the elements of J so far. The assumption of no-arbitrage for J can be written J ∩ X + = {0} or equivalently (J − X + ) ∩ X + = {0}. A free lunch denoting the possibility of getting arbitrarily close to an arbitrage opportunity, we introduce the following definition. Definition 2.2 There is no free lunch for J if and only if J − X + ∩ X + = {0}, where the bar denotes the closure for the norm topology in X. We now characterize the absence of free lunch. Notice that since we do not necessarily have the opportunity to transfer money from one time to another, we cannot consider “net gains” anymore, and we have to get an analog of the Kreps–Yan theorem (Yan (1980), Kreps (1981)) in a more complex space than the classical L 1 (, F, P) for a probability (or sigma-finite measure) space (, F, P). In our general context with investments in X , we obtain the following Fundamental Theorem of Asset Pricing. Theorem 2.3 Under Assumption A, there is no free lunch for J if and only if there exists a positive process g = (gt )t≥0 in Y such that g| J ≤ 0.

48

E. Jouini and C. Napp

Note that positive means here that g seen as a linear functional on X is positive, or equivalently that for all t, gt > 0 a.s. P. Since for all " ∈ J , .", g/ X,Y = E t≥0 gt "t , Theorem 2.3 means that the absence of free lunch (for J ) is essentially equivalent to the existence of a discount process under which the “net present value” of any available investment (in J ) is nonpositive. We shall denote by G J the set of all “admissible discount processes”, i.e. G J ≡ {g ∈ Y , g > 0, g| J ≤ 0}. If there is no free lunch, then according to Theorem 2.3, G J is non-void.

2.2 Application to the characterization of the no-free-lunch assumption in all cases of market imperfections in I As our investment opportunities are supposed to be very general, it is shown in Jouini and Napp (1998) that most market models involving imperfections can fit in the model for a specific convex cone of investments J satisfying Assumption A. This is the case for the following set (that we shall denote by I) of imperfect market models: models with imperfections concerning the num´eraire (no borrowing, different borrowing and lending rates), models with dividends, short-sale constraints, convex cone constraints, proportional transaction costs. Let us see how for instance Theorem 2.3, obtained in a general setting, can be applied to the case of short sale constraints. As in Jouini and Kallal (1995b), we consider a model of financial market where two sorts of securities can be traded. Short selling the first type of securities is not allowed, i.e. they can only be held in nonnegative amounts, whereas the second type of securities can only be held in nonpositive amounts. The model includes situations where holding negative amounts of a security is possible but costly as well as situations where some (or all) securities are not subject to any constraints, since we may include a security twice in the model, in the first and in the second set of securities. For 1 ≤ k ≤ n (resp. n + 1 ≤ k ≤ N ), we denote by S k the price process of the security k that can only be held in nonnegative (resp. nonpositive) amounts. We assume that for k ∈ {1, . . . , N }, Stk belongs to L 1 (, Ft , P) for all t, and that S 1 ≡ 1 (i.e. there are lending opportunities). For all t1 ≤ t2 , for all bounded nonnegative Ft1 -measurable real-valued random variables θ , we let "(k;θ,t1 ,t2 ) denote the process given by "t(k;θ ,t1 ,t2 ) = −θ Stk1 1t=t1 + θ Stk2 1t=t2 1 ,t2 ) for 1 ≤ k ≤ n and "(k;θ,t = θ Stk1 1t=t1 − θ Stk2 1t=t2 for n + 1 ≤ k ≤ N . We t assume that the set JS is the convex cone generated by all these investments. Then JS satisfies Assumption A and by an immediate application of Theorem 2.3, we get that there is no free lunch for JS , or equivalently that there is no free lunch in a model with short sale constraints, if and only if the set G JS is nonempty, where G JS denotes the set of positive processes g ∈ Y such that for all securities k that cannot

2. Arbitrage Pricing with Frictions

49

be sold short (i.e. k ≤ n), gS k is a supermartingale and for all securities k that can only be sold short (i.e. n + 1 ≤ k), gS k is a submartingale. We adopt in Jouini and Napp (2001) a similar approach for all other market imperfections in I. Each time, we introduce a specific set of available investments corresponding to the considered imperfection, we apply Theorem 2.3 and obtain more or less directly a specific characterization of the no-free-lunch condition in these imperfect market models. In each case, we find that there is no free lunch if and only if a given specific convex set of discount processes1 is nonempty.

2.3 A few remarks and extensions • In Jouini et al. (2000), we adopt a new topology on X for the definition of a free lunch. The idea is to weaken the topology on X ; to motivate this idea, F, µ) so that its dual recall that we have considered the norm topology on L 1 (, ∞ F, equals L (, F, µ). Considering the elements g = (gt )t∈R+ ∈ L ∞ (, µ) as functions on × R+ note that, for fixed ω ∈ , the function t → gt (ω) does not obey any continuity or measurability requirements (apart from being uniformly F, µ) seems too big for a useful economic bounded). The space Y = L ∞ (, interpretation and should be replaced by a space Y of more regular processes, e.g., the adapted bounded processes (yt )t∈R+ which almost surely have c`ad (right continuous) or c`ag (left continuous) or continuous trajectories. This leads us F, to consider the space X = L 1 (, µ) in duality with the space Y proposed above and to equip X with a topology τ compatible with the dual pair .X, Y /. We prove in Jouini et al. (2000) that in this setting we do have a positive result of Yan type, hence a characterization of the no-free-lunch assumption, without Assumption A; more precisely, we prove that for all closed convex cones in X such that C ⊇ X − , if C ∩ X + = {0}, then we can find a strictly positive linear functional y ∈ Y++ , such that y|C ≤ 0. • Still in Jouini et al. (2000), we generalize the framework of Section 2.1, by considering a space of investments given by a space of measures. More precisely, we take X given by M (R+ × , O), the space of equivalence classes of finite measures µ on the optional sigma-algebra O, modulo the measures supported by evanescent sets. Note that this enables us to model in X continuous time payment streams (which may or may not be absolutely continuous with respect to Lebesgue-measure). We obtain a characterization of the no-free-lunch assumption in such a context. • We study in Napp (2000) the links between the extremality or the uniqueness of the “admissible discount process” given by the absence of free lunch and the 1 See Section 4 for a description of this set in the transaction costs case.

50

E. Jouini and C. Napp

completeness of the market, in the case where the convex cone J of available investments is a linear subspace of X . Similar results have been obtained in Jacod (1979), Harrison and Pliska (1981), Delbaen (1992) and Delbaen and Schachermayer (1994).

3 Arbitrage intervals and superreplication cost Now that we have characterized the absence of free lunch, we shall turn to pricing issues, still in the framework of Section 2.

3.1 Arbitrage intervals We start with the general framework with a convex cone of available flows. We adopt the approach of Harrison and Kreps (1979). We assume that we are faced with a so-called primitive financial market consisting of a convex cone C of available investment opportunities satisfying Assumption A. We suppose that there is no free lunch in the primitive market or equivalently that there is no free lunch for C, so that according to Theorem 2.3, the set G C is nonempty. In addition to this primitive market, we consider a contingent flow in the form of some process ˇ = ("t )t>0 ∈ Xˇ . The aim of this subsection is to give a “fair” price to this " additional contingent flow by only using arbitrage considerations. ˇ ∈ Xˇ if there is no free lunch in We say that (−"0 ) is a fair (buying) price for " the so-called full market consisting of the convex cone C generated in X by C and " ≡ ("t )t≥0 . These values of (−"0 ) can be seen as the price to pay at date 0 in order to have access to the flows "t at each date t > 0, in a way that iscompatible ˇ gt ˇ ∈ Xˇ , let l"ˇ ≡ infg∈G C ", with the no-free-lunch condition. For all " g0 t>0 ˇ ˇ X ,Y g t ˇ and u "ˇ ≡ supg∈G C ", . For simplicity of notation, we shall indifferg 0 t>0 Xˇ ,Yˇ ˇ gt ˇ g ently write ", or ", . g0 g0 t>0 Xˇ ,Yˇ

Xˇ ,Yˇ

C ˇ Lemma 3.1 A price (−"0 ) is a fair price for " if and only if there exists g ∈ G , ˇ g . Any fair price (−"0 ) satisfies (−"0 ) ≥ l"ˇ . Conversely, any (−"0 ) ≥ ", g0 Xˇ ,Yˇ

ˇ price (−"0 ) > l"ˇ is a fair price for ". We have obtained a lower bound on the value of any fair (buying) price. Any fair buying price for a contingent flow is a price that is greater than or equal to the net present value of the flow with respect to some admissible discount process. In ˇ ˇ a natural way, a fair selling price for " ∈ X is the opposite of a fair (buying) price ˇ ≡ −" ˇt ˇ we get that any fair selling for −" . By applying Lemma 3.1 to −", t>0

2. Arbitrage Pricing with Frictions

51

ˇ satisfies (−")0 ≤ u "ˇ and that, conversely, any price (−")0 < u "ˇ is a price for " ˇ Notice that if " ˇ can be bought and sold, then by arbitrage fair selling price for ". considerations, its buying price necessarily lies above its selling price. ˇ ∈ Xˇ if there is no free We say that (−"0 ) is a fair buying–selling price for " lunch in the market consisting of the convex cone generated in X by C, " and −". ˇ can be bought and sold without generating It corresponds to the price at which " any free lunch. ˇ Corollary 3.2 A price (−" 0 ) isa fair buying–selling price for " if and only if there g C ˇ exists g ∈ G , (−"0 ) = ", . Any fair buying–selling price (−"0 ) belongs g0 Xˇ ,Yˇ to l "ˇ , u "ˇ . Conversely, if l"ˇ = u "ˇ , then there is a unique fair buying–selling price equal to l"ˇ , and if l"ˇ < u "ˇ , then any price (−"0 ) ∈ l"ˇ , u "ˇ is a fair ˇ buying–selling price for ". If G C is reduced to a singleton, then there exists a unique fair buying–selling ˇ ∈ Xˇ . If G C is not reduced to a singleton, we only obtain arbitrage price for any " intervals for the price of contingent flows. For any contingent flow which can be bought and sold, its arbitrage interval consists of its net present value under all admissible discount processes in G C . We can now apply these results for the pricing of contingent claims in any market ∗ model in I. Let T ∈ R+ . A contingent claim will denote any random variable H 1 in L (, FT , P), corresponding to the payoff at date2 T . We want to give a fair price to a contingent claim H by only using arbitrage considerations. We still assume that we are faced with a so-called primitive financial market consisting of a convex cone C of available investment opportunities satisfying Assumption A and such that the set G C is nonempty. In addition to this primitive market, we assume that investors have access to the contingent claim H so that the set of all available investment opportunities consists of the convex cone C generated by C and the contingent " H ∈ X given by "TH = H and "tH = 0 for all t ∈ / {0, T }. We flow H H say that −"0 is a fair (buying) price for H if it is a fair price for "t t>0 ∈ Xˇ . By applying Lemma 3.1 to the investment opportunity "tH t>0 in Xˇ , we immediately get the following result. H Corollary 3.3 Any fair buying price −" for a contingent claim H satisfies 0 g ≤ −"0H ≥ infg∈G C E gT0 H . Any fair selling price for H satisfies "−H 0 2 Notice that contingent claims whose payoffs belong to Xˇ , without necessarily being related to a unique date

T , also fall in our framework.

52

E. Jouini and C. Napp

supg∈G C E ggT0 H . If H can be bought and sold at the same price, then −"0H ∈ infg∈G C E ggT0 H , supg∈G C E ggT0 H . We are now able to use the specific characterization of the set G C obtained in the different imperfect market models in I (see Jouini and Napp (2001)) to obtain in each case specific arbitrage bounds. We state the result with short sale constraints, i.e. in the case where, with the notations of Section 2, C is given by JS . Corollary 3.4 If there are short sale constraints, the buying price for any contingT gent claim H is greater than or equal to infg∈G JS E g0 H , and if there is a selling price for H , it is smaller than or equal to supg∈G JS E ggT0 H . We shall now pin down these arbitrage intervals, through the use of the superreplication cost. 3.2 Arbitrage bounds and superreplication cost The aim of this subsection is to show that the upper bound of the arbitrage interval, in a general context with flows as well as in market models with frictions in I, is given by the so-called superreplication cost; for a contingent flow x ∈ Xˇ , this cost corresponds to the minimum initial wealth needed to obtain, through available investments, at least as much as the flow x. This notion was originally introduced by Kreps (1981) for classical contingent claims in the context of incomplete markets (with no other imperfection). All available investments still consist of a convex cone and we consider the set M of contingent flows in Xˇ that agents can “dominate” by using available investment opportunities, M ≡ x ∈ Xˇ , ∃" ∈ J, "t ≥ xt ∀t > 0 . In words, M is the set of flows m for which there exists an available investment (in J ), which is unambiguously better than m after the initial date. We now introduce on M the notion of superreplication cost. Definition 3.5 For all m ∈ M, the superreplication cost of m is denoted by π¯ (m) and given by π¯ (m) ≡ inf lim inf −"n0 ; "nt ≥ m nt ∀t > 0, "n , m n ∈ J × M, m n → Xˇ m . The superreplication cost represents the infimum wealth necessary to subscribe to an investment opportunity which will provide us with at least as much as a flow arbitrarily close to m. Like in Jouini and Kallal (1995a) for the case of proportional transaction costs, we start by describing the set M and the functional π. ¯

2. Arbitrage Pricing with Frictions

53

Lemma 3.6 The set M is a convex cone. If there is no free lunch for J , the price functional π¯ is a sublinear3 lower semi continuous4 functional which takes values in R. We are now in a position to obtain a dual representation formula for the upper bound of the arbitrage intervals. Proposition 3.7 If there is no free lunch for J , then for all m ∈ M, π¯ (m) = g ˇ supg∈G J ", . g0 Xˇ ,Yˇ

This means that the superreplication cost of a contingent flow is equal to the supremum of its expected value with respect to all admissible discount processes, which coincides with the upper bound of the arbitrage interval. If we now consider some m ∈ M such that −m ∈ M, a symmetric yields argument g g , or [−π¯ (−m) , π¯ (m)] = cl g0 , m ; g ∈ G J , −π¯ (−m) = infg∈G J m, g0 Xˇ ,Yˇ

so that the bounds of the arbitrage intervals, in the general context with flows as well as for contingent claims in imperfect market models (belonging to I), are completely characterized in terms of superreplication cost. Note that for some authors, the “true” superreplication cost is given on M by π (m) = inf {(−"0 ) ; " ∈ J, "t ≥ m t ∀t > 0}. It is proved in Napp (2000) that under the assumption of no-free-lunch, π¯ is the largest lower semi continuous functional lying below π . Besides, we investigate when the upper bound of the arbitrage interval is effectively given by the “true” superreplication price π , in other words, when π¯ = π. We get the equality when π is l.s.c. or each time that for every scalar λ, the set of contingent flows that can be dominated by an available investment opportunity with initial value smaller than or equal to λ is closed. More generally, we consider some specific market models in I for which more simple expressions for π¯ can be obtained: discrete models as well as models with short sale constraints and imperfections on the num´eraire if we assume that asset prices are continuous. Notice however that the approach with π¯ has enabled us to characterize the arbitrage bounds in a general framework.

3.3 A few remarks and extensions In Napp (2000), we adopt an axiomatic approach. Like in Harrison and Pliska (1981) and more recently Jouini (2000) for the case of proportional transaction 3 That is, for all m , m in M and all λ ∈ R , we have π¯ (m + m ) ≤ π¯ (m ) + π¯ (m ) and π¯ (λm ) = + 1 2 1 2 1 2 1 λπ¯ (m 1 ). 4 That is, such that {(m, λ) ∈ M × R; π¯ (m) ≤ λ} is closed in M × R, or equivalently such that

{m ∈ M; π¯ (m) ≤ λ} is closed in M for all λ ∈ R, or equivalently such that lim infn {π¯ (m n )} ≥ π¯ (m) whenever the sequence (m n ) ⊂ M converges to m ∈ M.

54

E. Jouini and C. Napp

costs, and Koehl and Pham (2000) for convex constraints, we start from a certain number of axioms that a price functional, defined on the set of contingent flows, must satisfy in order to be admissible. These axioms are linked not only to arbitrage but also equilibrium considerations. We obtain a dual characterization of all admissible functionals. A similar axiomatic approach will be adopted in Section 4 for models with fixed transaction costs. We also study issues related to the viability (a notion introduced by Harrison and Kreps (1979)), or equivalently to the compatibility with an equilibrium, of the pricing rules we have found. We emphasize that all results obtained for a general contingent flow can be applied to contingent claims in securities market models with frictions belonging to I.

4 Models with fixed transaction costs We consider in this section financial models where the available investment flows are subject to fixed transaction costs.

4.1 The characterization of the no-free-lunch assumption in a general model with fixed costs We introduce a few notations. We denote by S f the collection of stopping times of (Ft )t≥0 taking a finite number of values in R+ . For any τ ∈ S f , we denote by Sτf the class of stopping times ν in S f with τ ≤ ν a.s. Definition 4.1 An investment consists of 1. an initial stopping time τ in S f 2. a starting event B in Fτ 3. an (Ft )t≥0 -adapted process " = ("t )t≥0 such that " is null outside B, and " f there exists a finiteset of stopping times τ = τ " 1 ≤ . . . ≤ τ N" in Sτ for which "t = 0 for all t ∈ / τ l" l∈ {1,...,N" } and for all l, "τ l" ∈ L 1 , Fτ l" , P . We shall call the process " the investment process. The starting stopping time and event can correspond to the stopping time and event at which one investor may subscribe to the investment opportunity. The investment process corresponds to the associated cash flow. We still consider a convex cone I of available investment processes and for all pairs (τ , B) ∈ S f × Fτ , we let I τ ,B (resp. J τ ,B ) denote the set of all available investment processes associated with investments with starting stopping time τ I ν,B ). and starting event B (resp. starting after τ and B, i.e. J τ ,B = ∪ ν≥τ B ⊆B

2. Arbitrage Pricing with Frictions

55

We assume that we can transfer wealth from one date to another,i.e. that, for all stopping times τ 1 , τ 2 in S f and for all random variables θ in L 1 , Fτ 1 ∧τ 2 , P , ,τ 1 ,τ 2 ) = −θ1t=τ 1 + θ1t=τ 2 the process denoted by "(0;θ,τ 1 ,τ 2 ) and given by "(0;θ t with starting stopping time τ 1 ∧ τ 2 and starting event equal to {θ = 0} belongs to the set I of all available investment processes. We shall denote by the set of such transfers, i.e. the convex cone generated by all these investment processes. We assume that it is not costless to subscribe to an investment, i.e. that there are “fixed costs” associated with any investment plan. More precisely, we as(τ ,B,") = sociate with each investment (τ , B, ") a nonnegative cost process c (τ ,B,") " ; when there is no ambiguity, we shall sometimes write c instead ct t≥0

of c(τ ,B,") . The assumptions we make on the fixed costs are the following: we assume first that the cost process is (Ft )t≥0 -adapted, which means that investors know at time t the past and current values of the fixed cost but nothing more. We assume that the cost process c(τ ,B,") is null before the stopping time τ , outside the event B, and outside a finite number of stopping times in S f . Besides, we assume that there is no fixed cost associated with the transferring of wealth from one date to another, i.e. for all " ∈ I, for all % ∈ , we have c" = c"+% . Moreover, the total cost associated with any investment opportunity is supposed to be bounded, i.e. there exists a positive real number C such that t≥0 ct" ≤ C for all " ∈ I, which can be interpreted as the investors’ refusal to pay more than a certain given amount for fixed costs: this explains why we call these costs fixed costs as opposed to proportional costs. Finally, the fixed costs incurred at the initial stopping time must be “positive”, i.e. for all (τ , B) ∈ S f × Fτ , there exists a positive real number / satisfy cτ" ≥ ετ ,B on ετ ,B , such that all investment processes " ∈ I τ ,B with " ∈ B. According to these assumptions, the fixed costs can be interpreted as information costs, opportunity costs, time costs, etc. In a financial market model, they can correspond to fixed brokerage fees. They can account for a sort of cost of accessing5 the available investments or more generally for frictions of all kinds. As usual, an arbitrage opportunity is an investment plan that yields a positive gain in some circumstance, without a countervailing threat of loss in other circumstances and a free lunch is a possibility of getting arbitrarily close to an arbitrage opportunity. Definition 4.2 An arbitrage opportunity is an available investment (τ , B, ") with " in I such that "t − ct" ≥ 0 for all t ≥ 0, and there exists a date for which it is nonnull. 5 This “cost of accessing the investment opportunities” can be understood in a general sense: it can be a fee

(such as an investment tax), or the cost of setting up an office.

56

E. Jouini and C. Napp

For all pairs (τ , B) ∈ S f × Fτ , we let Aτ ,B denote the set of all nonnegative investment processes u such that u τ > εu on B for some positive constant εu and we obtain the following characterization of the absence of arbitrage opportunity in our model. Lemma 4.3 There is no arbitrage opportunity if and only if for all (τ , B) ∈ S f × Fτ , we have I τ ,B ∩ Aτ ,B = ∅. Using the same notations as for the definition of an arbitrage opportunity, we now introduce the notion of free lunch. We shall consider the set I as a subset of 1 ˆ ˆ L , F, µ ˆ , considered in Section 2.1, and adopt the norm topology on this space. Definition 4.4 There is a free lunch if and only if there exist a pair (τ , B) ∈ S f ×Fτ ˆ µ ˆ F, ˆ ∩ Aτ ,B = ∅, where the bar denotes the closure in for which I τ ,B − L 1+ , ˆ µ ˆ F, ˆ . L 1 , See Jouini, Kallal and Napp (2000) for an interpretation of the definition of a free lunch in a securities market model with fixed transaction costs. Notice that the assumption of no-free-lunch in such a model is less restrictive than in the withoutfixed-cost otherwise identical model. We now obtain the main result. Theorem 4.5 There is no free lunch if and only if for all (τ , B) ∈ S f × Fτ , there exists an absolutely continuous probability measure P τ ,B with bounded density such that P τ ,B (B) = 1 and for every investment process " in J τ ,B , τ ,B EP t≥0 "t ≤ 0. This means that the absence of free lunch in our model with fixed trading costs is equivalent to the existence of a family of absolutely continuous probability measures under which the net present value of any available investment is nonpositive.

4.2 Application to securities market models with both fixed and proportional costs We consider an economy where agents can trade a finite number of securities and we assume that these securities are subject to bid–ask spreads: at each date, there is not a unique price for a security but an ask price, at which investors can buy the security and a bid price, at which they can sell the security. Notice that this model includes situations where there is a unique price process Z and where the proportional transaction cost remains constant over time, i.e. situations where at each time t, investors must pay Z t (1 + c) for some positive constant c to buy the security and receive Z t (1 − c) when selling it.

2. Arbitrage Pricing with Frictions

57

More precisely, we consider (n + 1) securities and for each security k for k 0 ≤ k ≤ n, we let Z t t≥0 and Z tk t≥0 denote respectively the ask and bid price process. We assume that the (n + 1)-dimensional processes Z and Z are right-continuous and of class D f , i.e. that the families {Z τ }τ ∈S f and Z τ τ ∈S f are uniformly integrable. For each k in {0, . . . , n}, for all stopping times τ 1 and τ 2 in S f , for all nonnegative real-valued bounded random variables θ in Fτ 1 ∧τ 2 , we let "(k;θ ,τ 1 ,τ 2 ) denote the process given by 1 ,τ 2 ) = θ −Z τk 1 1t=τ 1 + Z τk2 1t=τ 2 "(k;θ,τ t and we assume that the set I of all available investment processes consists of the convex cone generated by all the processes "(k;θ ,τ 1 ,τ 2 ) . This means that all available investment opportunities are related to the buying and selling of the (n + 1) securities, at some stopping times and in random quantities. We still assume that we can transfer wealth without friction, i.e. we set for all t, Z t0 = Z t0 = 1. Like in the previous section, we assume that there are fixed costs associated with these investment opportunities. The assumptions made on the fixed costs remain the same as above but their interpretation in this specific setting can be made more accurately. First, if an investor doesn’t trade in the risky securities at time t, then he doesn’t pay any additional cost; but in order to buy at stopping time τ a “portfolio” &τ , he must pay &τ · Z τ + cτ& , where cτ& denotes the fixed cost to be paid by the investor at stopping time τ when following the strategy &. The fixed cost can depend upon the strategy followed by the investors: for instance at the same date and event, it can be different according to what the investor has done before that date and event; this means equivalently that the fixed costs to be paid are not necessarily the same for all investors. Second, the aggregated fixed costs are bounded independently of the chosen strategy and independently of the considered investor, or in other words we assume that there exists a positive real number C such that for all strategies &, t≥0 ct& ≤ C. This means in particular that the fixed costs to be paid at some date t are bounded independently of the amount traded, which explains why we call them fixed costs as opposed to proportional costs. Finally, we assume that at the first time an investor trades, he incurs a positive fixed cost, which is to be interpreted as a cost of accessing the market. We get the following characterization of the absence of free lunch in a model with proportional and fixed transaction costs. Theorem 4.6 There is no free lunch in our model with fixed and proportional transaction costs if and only if for all (τ , B) ∈ S f × Fτ , there exists an absolutely

58

E. Jouini and C. Napp

continuous probability measure P τ ,B with bounded density such that P τ ,B (B) = 1 and some process S τ ,B satisfying Z t 1 B∩{τ ≤t} ≤ Stτ ,B 1 B∩{τ ≤t} ≤ Z t 1 B∩{τ ≤t} τ ,B τ ,B τ ,B EP for t ≥ s. St∨τ | Fs∨τ = Ss∨τ This means that for all (τ , B) ∈ S f × Fτ there exists an absolutely continuous probability measure P τ ,B that transforms some price process S τ ,B lying after τ and on B between the discounted bid and ask price processes into a martingale from the stopping time τ and event B. In the case where there is no proportional transaction cost, i.e. if Z = Z , we find that the absence of free lunch in a securities market model with fixed transaction costs is equivalent to the existence of a family of absolutely continuous martingale measures. Our characterization of the no-freelunch assumption is then weaker than the classical one, and leads to a larger class of arbitrage-free models.

4.3 Pricing issues in securities market models with fixed transaction costs The framework is the same as in the previous section except that in order to concentrate on the fixed costs, we assume that Z = Z , in other words there is no proportional transaction cost. As in Section 3, we consider a finite time horizon T , and a contingent claim H to consumption at the terminal date T is a random variable belonging to L 1 (, FT , P) . A contingent claim H is said to be attainable (in the model without fixed cost) if there exists some available investment process " in I 0, such that "t = 0 for all t ∈ ]0, T [ and "T = H. Note that the set M of all attainable contingent claims is a linear space. We shall now define and characterize pricing rules p on M that are admissible. As in Section 3, we introduce the definition of the superreplication price of H , π c (H ), in our framework with fixed costs π c (H ) ≡ inf −"0 + c0" , " ∈ I 0, , "t − ct" ≥ 0 for all t ∈ ]0, T [ , "T ≥ H + cT" Definition 4.7 An admissible pricing rule on M is a functional p defined on M, such that 1. p induces no arbitrage, i.e., it is not possible to find processes "1 , . . . , "n in n I 0, , such that "it = 0 for all t ∈ ]0, T [ and for which i=1 p "iT ≤ 0, n i i=1 "T ≥ 0 and one of the two is nonnull. 2. p (H ) ≤ π c (H ).

2. Arbitrage Pricing with Frictions

59

Part 1 is the usual no-arbitrage condition. Part 2 says that an admissible price for the contingent claim H must be smaller than its superreplication price: if it is possible to obtain a payoff at least equal to H at a cost π c (H ), then no rational agent (who prefers more to less) will accept to pay more than π c (H ) for the contingent claim H. The following proposition characterizes the admissible pricing rules on M through the use of the absolutely continuous martingale measures obtained in Theorem 4.6. Proposition 4.8 Under the assumption of no-free-lunch, any admissible pricing rule p on M can be written as ∗

p(H ) = E P [H ] + c(H )

for all H in M

where P ∗ is any absolutely continuous martingale measure and c is a bounded functional defined on M. If we assume that for a large enough scalar λ, we have p (λx) < λ [ p (x)], then the fixed cost functional is nonnegative; moreover, if we assume that there exists ε > 0, such that for a large enough λ, p (λx) < λ [ p (x) − ε], then the fixed cost is greater than or equal to this positive constant ε. ∗ Notice that Proposition 4.8 implies that p(λH )/λ →λ→∞ E P [H ] for any attainable contingent claim H, where P ∗ is any absolutely continuous martingale measure. This means that the unit price of any attainable contingent claim H is ∗ equal to E P [H ] in the limit of large quantities. In particular, in a Black–Scholeslike model with fixed costs, the unique asymptotic price for any contingent claim is given by the usual Black–Scholes price. Appendix A Proof of Theorem 2.3 The proof is adapted from Yan (1980). It is very similar to the one in Jouini and Napp (2001), where Assumption A is also made. Let x ∈ J − X + ∩ X + , x = limn x n , where for all n, xn ≤ "n , "n ∈ J . Then, since g is nonnegative and g| J ≤ 0, for all n, .x n , g/ X,Y ≤ ."n , g/ X,Y ≤ 0. This implies .x, g/ X,Y ≤ 0, hence x = 0. Conversely, if J − X + ∩ X + = {0}, then for all x = 0, belonging to X + , the Hahn–Banach Separation Theorem yields the existence of g = 0, belonging to Y such that g| J −X + ≤ 0 < .x, g/ X,Y . It is easy to check that g is nonnegative. Let G J denote the nonempty set of all nonnegative g ∈ Y , g| J ≤ 0. We start by proving that for all dates t, there exists a process g t ∈ G J , such that gtt > 0 P a.s. Let S t be the family of equivalence classes of subsets of formed

60

E. Jouini and C. Napp

by the supports of the gt for all g in G J . By applying the Separation Theorem to the element x of X + such that x t = 1, xs = 0, ∀s = t, we get that the family S t is not reduced to the empty set. It is easy to see that the family S t is closed under countable unions. Hence there is gt in G J such that S t ≡ gtt > 0 satisfies P S t = sup P (S) ; S ∈ S t . We necessarily have P S t = 1; indeed, if P S t < 1, then we can apply the Separation Theorem to x such that xt = 1(−S t ) , x s = 0, ∀s = t and get the existence of g t ∈ G J , x, g t X,Y > 0. Then gtt + gtt > 0 would be an element of S t , with P-measure strictly greater than S t : a contradiction. Now we show that there exists g ∈ G J such that gdn > 0 almost surely for all dn ∈ d, where d is the sequence introduced in Assumption A. We consider the process g such that for all t ≥ 0, gt= n≥0 an gtdn , where (an )n≥0 is a sequence of positive scalars such that n≥0 an g dn Y < ∞. We find that g belongs to G J and satisfies gdn > 0 almost surely for all dn ∈ d. It remains to show that for all t, gt > 0 P a.s. Assume that for some T outside the set of dates {dn ; n ∈ N } we have just considered, the event BT ≡ {gT = 0} has positive P-probability; according to Assumption A, we know that there exists " ∈ J such that "T = 0 outside BT , "t = 0 ∀t < T , "t ≥ 0 ∀t > T and ∃dn ∈ d, P "dn > 0 > 0. For this particular investment " ∈ J , we would have .", g/ X,Y ≥ E "dn gdn > 0: a contradiction. Proof of Lemma 3.1 Since C satisfies Assumption A, and C is the convex cone ˇ generated in X by C and " ≡ ("t )t≥0 , a price (−"0 ) is a fair price for " if C and only if there exists g ∈ G satisfying E t≥0 gt "t ≤ 0 or, using the strict g ˇ . positivity of g, (−"0 ) ≥ ", g0 Xˇ ,Yˇ

1 ˇ g1 Proof of Corollary 3.2 Since gg0 , g ∈ G C is a convex set, if ", ≤ −"0 ≤ g0 Xˇ ,Yˇ 2 ˇ g2 for g 1 , g 2 ∈ G C , then there exists g ∈ G C , g0 = 1, such that −"0 = ", g0 Xˇ ,Yˇ ˇ g ", . g0 Xˇ ,Yˇ

Proof of Corollary 3.3 Immediate using Lemma 3.1. Proof of Corollary 3.4 Immediate applying Corollary 3.3. Proof of Lemma 3.6 The proof is adapted from Kreps (1981) and Jouini and Kallal (1995a). We shall repeatedly use the fact (F) that by a standard diagonalization

2. Arbitrage Pricing with Frictions

61

procedure, there exists a sequence ("n , m n ) , "n ≥ m n → Xˇ m, for which π¯ (m) = limn −"n0 . By definition, for all m ∈ M, π ¯ (m) < ∞. If there is no free lunch, for all g J g ∈ G , we have π¯ (m) ≥ m, g0 for all m ∈ M; indeed, assume that there Xˇ ,Yˇ

n n n exists a sequence (" , m ) in Jˇ × M such that "t ≥ m t ∀t > 0, m → Xˇ m, then for all g ∈ G J , −"n0 ≥ m n , gg0 →n m, gg0 , so that using (F), π¯ (m) ≥ ˇ ,Yˇ X Xˇ ,Yˇ m, gg0 . In particular, this implies that for all m ∈ M, π¯ (m) > −∞ and for all n

n

Xˇ ,Yˇ

m= 0 belonging to Xˇ + ∩ M, π¯ (m) > 0. Since J is a convex cone, it is easy to see that M is also a convex cone. Using ∗ , we (F), it is immediate that π¯ is such that for all m 1 , m 2 in M and all λ ∈ R+ of have π¯ (m 1 + m 2 ) ≤ π¯ (m 1 ) + π¯ (m 2 ) and π¯ (λm 1 ) = λπ¯ (m 1 ). By definition g J π¯ , we have π¯ (0) ≤ 0; we have seen that for all g ∈ G , π¯ (m) ≥ m, g0 for all Xˇ ,Yˇ

m ∈ M, thus π¯ (0) = 0. Let us show that π¯ is l.s.c. Let λ ∈ R and (m n ) be a sequence in M converging to m ∈ M such that π¯ (m n ) ≤ λ for all n ≥ 0. Then, using (F), for all n ≥ 0, there exists ("n , m ∗n ) in J × M, such that -m n − m ∗n - Xˇ ≤ 1/n, "nt ≥ m ∗n t ∀t > 0 and −"n0 ≤ λ + 1/n. Since m ∗n converges to m, we must then have π¯ (m) ≤ λ and the set {m ∈ M; π¯ (m) ≤ λ} is closed. Proof of Proposition 3.7 We show that (M, π) ¯ satisfies the assumptions of Corollary B.2 in Appendix B. If there is no free lunch, π¯ is an l.s.c. functional on the convex cone M (Lemma 3.6). By definition of M and π¯ , we have Xˇ − ⊆ M and π¯ ≤ 0 on Xˇ − . Since there is no free lunch for J , G J = ∅ and for all , hence there exists a positive continuous linear g ∈ G J , π¯ (m) ≥ m, gg0 Xˇ ,Yˇ

functional on Xˇ , whose restriction to M lies below ¯ We can apply Corollary B.2, π. and we obtain that for all m ∈ M, π¯ (m) = sup l (m) , l ∈ Yˇ , l > 0, l| M ≤ π¯ . It ˇ is then easy to verify that a positive l ∈ Y satisfies l| M ≤ π¯ if and only if it is if the for some g ∈ G J . Indeed, we have seen in the proof of Lemma form l = gg0t t>0

¯ conversely, if l| M ≤ π, ¯ then for all 3.6 that any g ∈ G J , g0 = 1 satisfies g| M ≤ π; " ∈ J, E t>0 l t "t ≤ −"0 and letting l0 = 1, (l t )t≥0 | J ≤ 0. Proof of Lemma 4.3 If there is an arbitrage opportunity, then there exists an available investment (τ , B, ") for which "t − ct" ≥ 0 for all t ≥ 0, hence "τ ≥ cτ" ≥ ε τ ,B on B and "t ≥ 0 for all t ≥ 0, so that " ∈ I τ ,B ∩ Aτ ,B . Conversely, suppose that there exists " ∈ I τ ,B ∩ Aτ ,B . Then there exists ε" ∈ ∗ R+ such that "τ ≥ ε " . The investment process λ" with λ such that λε " ≥

62

E. Jouini and C. Napp

C enables us to get enough at the initial stopping time to cover, through wealth transfer, present and future transaction costs. Proof of Theorem 4.5 Using Lemma 4.3, it is easy to see that there is no free lunch and only if for all (τ , B) ∈S f × Fτ , K τ ,B − L 1+ ∩ A B = ∅, where K τ ,B ≡ if τ ,B , A B ≡ f ∈ L 1 ; ∃ε > 0, f ≥ ε on B and the bar denotes t≥0 "t ; " ∈ J the closure in L 1 (, R). Assume first the existence of a family of absolutely continuous probability measures like in the theorem. Let u belong to K τ ,B − L 1+ . Then there exist sequences (u n )n≥0 and (m n )n≥0 such that u n ≤ m n , m n ∈ K τ ,B τ ,B τ ,B and u n → u. Since E P [m n ] ≤ 0, we have E P [u n ] ≤ 0 and since P τ ,B has L1

τ ,B

τ ,B

τ ,B

bounded density, we have E P [u n ] → E P [u]. Then E P [u] ≤ 0 and it is n→∞ not possible to have u ≥ ε on B for some positive real number ε. Conversely, assume now that for all (τ , B) in S f × Fτ , we have K τ ,B − L 1+ ∩ A B = ∅. Since J τ ,B is a convex cone, the set K τ ,B is also a convex cone and we can apply a strict separation theorem in L 1 to the closed convex cone K τ ,B − L 1+ and {1 B } to find g τ ,B in L ∞ and two real numbers α and β with α < β such that g τ ,B | K τ ,B −L 1 ≤ α < β < 1 B , g τ ,B . It is easy to see that g τ ,B ≥ 0, that we can +

take α = 0, that g τ ,B = 0 on B and that g τ ,B | K τ ,B ≤ 0. Letting then P τ ,B be given τ ,B by d P τ ,B /d P ≡ E [11B ggτ ,B ] , we get the result wanted. B

Proof of Theorem 4.6 Assume first that there exist a family of probability measures and an associated family of price processes like in the theorem. Then, according to the proof of Theorem 4.5, and adopting the same notations, we only need to prove that for all (τ , B) ∈ S f × Fτ , for all random variables u τ ,B in K τ ,B , E P [u] ≤ 0. Usingthe specific form of K τ ,B , we are reduced to τ ,B proving that E P θ Z τk2 − Z τk 1 ≤ 0 for all τ 1 , τ 2 ∈ Sτf , k ∈ {1, . . . , n} and θ ∈ L ∞ , Fτ 1 ∧τ 2 , P . For such θ, we have k τ ,B k τ ,B τ ,B τ ,B k θ Z τ 2 − Z τk 1 ≤ E P EP − Sττ1,B | Fτ 1 ∧τ 2 . θEP Sτ 2 By the optional sampling theorem (see e.g. Karatzas and Shreve (1988)), we obtain that k τ ,B τ ,B k τ ,B τ ,B k Sτ 2 Sτ 1 EP | Fτ 1 ∧τ 2 = Sττ1,B∧τ 2 = E P | Fτ 1 ∧τ 2 . For the converse implication, we assume that there is no free lunch, so we know from Theorem 4.5 that for all (τ , B) in S f × Fτ , there exists an absolutely continτ ,B uous probability measure P τ ,B with bounded density such that P (B) = 1 and τ ,B P τ ,B for all " ∈ J , E t≥0 "t ≤ 0. For all k ∈ {1, . . . , n}, for any stopping

2. Arbitrage Pricing with Frictions

63

times τ 1 and τ 2 in Sτf and for all A in Fτ 1 ∧τ 2 , the investment process "(k;1 A ,τ 1 ,τ 2 ) ∈ τ ,B −Z τk 1 + Z τk2 | Fτ 1 ∧τ 2 ≤ 0, thus J τ ,B and we get that E P τ ,B k τ ,B k EP Z τ 2 | Fτ 1 ∧τ 2 ≤ E P Z τ 1 | Fτ 1 ∧τ 2 . (A.1) For all ν ∈ Sτf , we consider the two n-dimensional families Z˜ ν ν∈Sτf and Z˜ ν ν∈Sτf given by τ ,B Z κ | Fν Z˜ ν = ess sup E P f

κ∈Sν

Z˜ ν

= ess inf E P f κ∈Sν

τ ,B

[Z κ | Fν ].

In words, Z˜ νk is the supremum of the conditional expected value of the proceeds from the strategies that consist of going short in the security k (and investing the proceeds in security 0) after the stopping time ν. The random variable Z˜ ν is defined symmetrically. It is a standard result in optimal stopping that for all κ in Sνf τ ,B EP Z˜ κ | Fν ≤ Z˜ ν τ ,B EP Z˜ κ | Fν ≥ Z˜ ν . Now, takingν ≡ s ∨ τ and κ ≡ t ∨ τ for all (s, t) for which s ≤ t, we obtain that τ ,B the process Z˜ t∨τ is a P -supermartingale for (Ft∨τ )t≥0 and that the process t≥0 τ ,B Z˜ t∨τ t≥0 is a P -submartingale for (Ft∨τ )t≥0 . Using inequality (A.1), we have Z˜ t∨τ ≤ Z˜ t∨τ . Now, using Lemma 3 in Jouini and Kallal (1995b) or Proposition 2.6 in andStricker is a process S τ ,B lying between Choulli (1997), we get that τthere ,B Z˜ t∨τ t≥0 and Z˜ t∨τ t≥0 on B, which is a P -martingale for (Ft∨τ )t≥0 . By definition, we have Z ≤ Z˜ and Z˜ ≤ Z after τ and on B, so that after τ and on B, Z ≤ Z˜ ≤ Z˜ ≤ Z . The process S τ ,B is then automatically between Z and Z , after τ and on B, which completes the proof. Proof of Proposition 4.8 We have assumed that there is no arbitrage in the primitive market, so that if " and % in I 0, are such that for all t ∈ ]0, T ], "t = %t , then "0 = %0 . We define on M a linear functional l given by l ("T ) = "0 . Now it is easy to see that for all H in M, lim

λ→+∞

π c (λH ) −π c (−λH ) = lim = l(H ). λ→+∞ λ λ

Since there is no arbitrage, we must have p (H ) ≥ − p (−H ) so that −π c (−H ) ≤ − p (−H ) ≤ p (H ) ≤ π c (H ),

64

E. Jouini and C. Napp

and the price functional p can be written as the sum of a continuous linear functional and a fixed cost, i.e., for all H , p (H ) = l (H ) + c (H ) where c(λH )/λ →λ→∞ 0. Notice that c (H ) ≡ p (H ) − l (H ) ≤ π c (H ) − l (H ) ≤ C. Consequently, in the absence of free lunch, the fair price p (H ) associated with any attainable contingent claim H is given by ∗

p (H ) = E P (H ) + c (H ) where P ∗ is any absolutely continuous martingale measure.

Appendix B Lemma B.1 Any l.s.c. sublinear functional s on a convex cone K ⊆ Xˇ can be written as the supremum over all continuous linear functionals on Xˇ , whose restriction to K lies below s, i.e. for all k ∈ K , s (k) = sup l∈Yˇ l (k). l| K ≤s

Proof We adapt the proof of the Fenchel–Moreau Theorem. Let t (k) ≡ sup l (k) , l ∈ Yˇ , l| K ≤ s . It is immediate that for all k ∈ K , s (k) ≥ t (k). Suppose that there exists k0 ∈ K , such that t (k0 ) < s (k0 ). Let A ≡ {(z, λ) ∈ K × R, s (z) ≤ λ}. Since s is ¯ sublinear, A is a convex cone. Then the closure of A in Xˇ × R, denoted by A, ¯ By the Hahn–Banach is a closed convex cone. Since s is l.s.c., (k0 , t (k0 )) ∈ / A. Separation Theorem, there exists a continuous linear functional ϕ defined on Xˇ × R and α ∈ R such that ¯ ϕ (k0 , t (k0 )) < α ≤ ϕ (z, λ) for all (z, λ) ∈ A.

(B.1)

The set A¯ being a cone, we can take α = 0. Hence there exist a continuous linear functional ϕ 1 on Xˇ and β ∈ R for which ϕ 1 (k0 ) + β [t (k0 )] < 0 ≤ ϕ 1 (z) + βλ for ¯ By taking z ∈ D (s), i.e. z such that s (z) < ∞, and λ = n → ∞ in all (z, λ) ∈ A. the preceding inequality, we see that β ≥ 0. ∗ Consider first the case s ≥ 0. Let ε ∈ R+ . Noting that by definition of A, for all z ∈ D (s), (z, s (z)) ∈ A, we get ϕ 1 (z) + (β + ε) s (z) ≥ 0. This implies that 1 the continuous linear functional − (β+ε) ϕ 1 lies below s on K , and by definition of 1 t, t (k0 ) ≥ − (β+ε) ϕ 1 (k0 ). This leads to ϕ 1 (k0 ) + (β + ε) t (k0 ) ≥ 0 for all ε > 0, which contradicts (B.1). For a general s, consider the functional s¯ ≡ s − f 0 , where f 0 is some continuous linear functional lying below s on K (the condition D (s) = ∅ ensures its existence). The functional s¯ is a nonnegative l.s.c. sublinear functional on K

2. Arbitrage Pricing with Frictions

65

such that D(¯s ) = ∅. The first part of the proof may be applied and we know that ˇ t¯ (k) ≡ sup l (k) , l ∈ Y , l| K ≤ s¯ = s¯ (k). It is clear that t¯ = t − f 0 , hence s = t on K . ˇ Corollary B.2 With the same notations as in Lemma B.1, if K ⊇ X − and s ≤ 0 on Xˇ − , then for all k ∈ K , s (k) = sup l (k) , l ∈ Yˇ+ , l| K ≤ s . Moreover, if there exists f ∈ Yˇ , f > 0, f | K ≤ s, then s (k) = sup l (k) , l ∈ Yˇ , l > 0, l| K ≤ s . Proof Let l ∈ Yˇ , l| K ≤ s. If K ⊇ Xˇ − and s ≤ 0 on Xˇ − , then for all x ∈ Xˇ − , .x, l/ Xˇ ,Yˇ ≤ 0, which means that l ∈ Yˇ+ . Now, suppose that L ≡ f ∈ Yˇ , f > 0, f | K ≤ s = ∅. Let f ∈ L. For all l ∈ Yˇ+ , l| K ≤ s, n1f + 1 − n1 l is a sequence of elements of L, and for all k ∈ K , k, n1 f + 1 − n1 l →n .k, l/. References Adler, I. and Gale, D. (1997), Arbitrage and growth rate for riskless investments in a stationary economy Math. Fin. 2, 73–81. Back, K. and Pliska, S.R. (1990), On the fundamental theorem of asset pricing with an infinite state space J. Math. Econ., 20, 1–18. Bensa¨ıd, B., Lesne, J.-P., Pag`es, H. and Scheinkman, J. (1992), Derivative asset pricing with transaction costs Math. Fin. 2, 63–86. Choulli, T. and Stricker, C. (1997), S´eparation d’une sur- et d’une sousmartingale par une martingale. Th`ese de T. Choulli. Universit´e de Franche-Comt´e. Cvitani´c, J. and Karatzas, I. (1993), Hedging contingent claims with constrained portfolios Ann. App. Prob. 3(3), 652–81. Cvitani´c, J. and Karatzas, I. (1996), Hedging and portfolio optimization under transaction costs: a martingale approach Math. Fin. 6, 133–66. Dalang, R.C., Morton, A. and Willinger, W. (1989), Equivalent martingale measures and no arbitrage in stochastic securities market models Stochastics and Stochastic Rep. 29, 185–202. Debreu, G. (1959), Theory of Value. Wiley, New York. Delbaen, F. (1992), Representing martingale measures when asset prices are continuous and bounded Math. Fin. 2, 107–30. Delbaen, F., Kabanov, Y. and Valkeila, E. (2001), Hedging under transaction costs in currency markets: a discrete-time model. To appear in Math. Fin. Delbaen, F. and Schachermayer, W. (1994), A general version of the fundamental theorem of asset pricing Math. Ann. 300, 463–520. Delbaen, F. and Schachermayer, W. (1998), The fundamental theorem of asset pricing for unbounded stochastic processes. Math. Ann. 312, 215–50. Duffie, D. and Huang, C. (1986), Multiperiod security markets with differential information: martingales and resolution times J. Math. Econ. 15, 283–303. Dybvig, P. and Ross, S. (1987), Arbitrage, in: Eatwell, J., Milgate, M. and Newman, P., eds., The New Palgrave: A Dictionary of Economics, vol. 1. Macmillan, London, 100–6. El Karoui, N. and Quenez, M.-C. (1995), Dynamic programming and pricing of contingent claims in an incomplete market SIAM J. Control and Optimization 33, 29–66.

66

E. Jouini and C. Napp

F¨ollmer, H. and Kramkov, K. (1997), Optional decomposition under constraints Prob. Theory Relat. Fields 109, 1–25. Harrison, M. and Kreps, D. (1979), Martingales and arbitrage in multiperiod security markets J. Econ. Theory 20, 381–408. Harrison, M. and Pliska, S. (1981), Martingales and stochastic integrals in the theory of continuous trading Stochastic Processes Appl. 11, 215–60. Jacod, J. (1979), Calcul Stochastique et Probl`emes de Martingales. Springer, Berlin. Jouini, E. (2000), Price functionals with bid–ask spreads. An axiomatic approach. J. Math. Econ. 34, 547–58. Jouini, E. and Kallal, H. (1995a), Martingales and arbitrage in securities markets with transaction costs J. Econ. Theory 66, 178–97. Jouini, E. and Kallal, H. (1995b), Arbitrage in securities markets with short-sales constraints Math. Fin. 5, 197–232. Jouini, E. and Kallal, H. (1999), Viability and equilibrium in securities markets with frictions Math. Fin. 9(3), 275–92. Jouini, E., Kallal, H. and Napp, C. (2000), Arbitrage and viability in securities markets with fixed transaction costs. To appear in J. Math. Econ. Jouini, E. and Napp, C. (2001), Arbitrage and investment opportunities. To appear in Finance and Stochastics. Jouini, E., Napp, C. and Schachermayer, W. (2000), Arbitrage and state price deflators in a general intertemporal framework. Preprint. Kabanov, Y. (1999), Hedging and liquidation under transaction costs in currency markets Finance and Stochastics 3(2), 237–48. Karatzas, I. and Shreve, S. (1988), Browninan Motion and Stochastic Calculus, (Graduate Texts in Mathematics, Vol. 113), Springer-Verlag, Berlin. Koehl, P.-F. and Pham, H. (2000), Sublinear price functionals under portfolio constraints J. Math. Econ. 33(3), 339–51. Kreps, D. (1981), Arbitrage and equilibrium in economies with infinitely many commodities J. Math. Econ. 8, 15–35. Lakner, P. (1993), Martingale measures for a class of right-continuous processes Math. Fin. 3(1), 43–53. Napp, C. (2000), Pricing issues with investment flows. Applications to market models with frictions. To appear in J. Math. Econ. Schachermayer, W. (1994), Martingale measures for discrete time processes with infinite horizon Math. Fin. 4, 25–55 Stricker, C (1990), Arbitrage et lois de martingale. Ann. Inst. Henri Poincar´e, vol. 26, 451–60. Yan, J.A. (1980), Caract´erisation d’une classe d’ensembles convexes de L 1 ou H 1 . S´em. de Probabilit´es. Lecture notes in Mathematics XIV 784, 220–2

3 American Options: Symmetry Properties J´erˆome Detemple

1 Introduction Put–call symmetry (PCS) holds when the price of a put option can be deduced from the price of a call option by relabeling its arguments. For instance, in the context of the standard financial market model with constant coefficients the value of an American put equals the value of an American call with strike price S, maturity date T , in a financial market with interest rate δ and in which the underlying asset price pays dividends at the rate r . This result was originally demonstrated by McDonald and Schroder (1990, 1998) using a binomial approximation of the lognormal model and by Bjerksund and Stensland (1993) in the continuous time model using PDE methods; it is a version of the international put–call equivalence (Grabbe (1983)). Put–call symmetry is a useful property of options since it reduces the computational burden in implementations of the model. Indeed, a consequence of the property is that the same numerical algorithm can be used to price put and call options and to determine their associated optimal exercise policy. Another benefit is that it reduces the dimensionality of the pricing problem for some payoff functions. Examples include exchange options and quanto options. PCS also provides useful insights about the economic relationship between contracts. Puts and calls, forward prices and discount bonds, exchange options and standard options are simple examples of derivatives that are closely connected by symmetry relations. Some intuition for PCS is based on the properties of the normal distribution. Indeed, in the model with constant coefficients the distribution of the terminal stock price is lognormal. Symmetry of the put and call option payoff function combined with the symmetry of the normal distribution then suggest that the put and call values can be deduced from each other by interchanging the arguments of the pricing functions. This can be verified directly from the valuation formulas for standard European and American options. As demonstrated by Gao, Huang and Subrahmanyam (2000) it is also true for European and American barrier options, 67

68

J. Detemple

such as down and out call and up and out put options, in the model with constant coefficients. Since option values depend only on the volatility of the underlying asset price it seems reasonable to conjecture that PCS will hold in diffusion models in which the drift is an arbitrary function of the asset price but the volatility is a symmetric function of the price. This intuition is exploited by Carr and Chesney (1994) who show that PCS indeed extends to such a setting. Since alternative assumptions about the behavior of the underlying asset price destroy the symmetry of the terminal price distribution it would appear that the property cannot hold in more general contexts. Somewhat surprisingly, Schroder (1999), relying on a change of numeraire introduced by Geman, El Karoui and Rochet (1995), is able to show that the result holds in very general environments including models with stochastic coefficients and discontinuous underlying asset price processes.1 This chapter surveys the latest results in the field and provides further extensions. Our basic market structure is one in which the underlying asset price follows an Itˆo process with progressively measurable coefficients (including the dividend rate) and the interest rate is an adapted stochastic process. We show that a version of PCS holds under these general market conditions. One feature behind the property is the homogeneity of degree one of the put and call payoff functions with respect to the stock price and the exercise price. For such payoffs the standard symmetry property of prices follows from a simple change of measure which amounts to taking the asset price as numeraire. The identification of the change of numeraire as a central feature underlying the standard PCS property permits the extension of the result to more complex contracts which involve liquidation provisions. A random maturity option is an option (put or call) which is automatically liquidated at a prespecified random time and, in such an event, pays a prespecified random cash flow. A typical example is a down and out put option with barrier L. This option expires automatically if the underlying asset price hits the level L (null liquidation payoff), but pays off (K − S)+ if exercised prior to expiration. Put–call symmetry for random maturity options states that the value of an American put with strike price K , maturity date T , automatic liquidation time τ l and liquidation payoff Hτ l equals the value of an American call with strike S, maturity date T , automatic liquidation time τ l∗ and liquidation payoff Hτ∗l in an auxiliary financial market with interest rate δ and in which the underlying asset price pays dividends at the rate r and has initial value K . The liquidation characteristics τ l∗ and Hτ∗l of the equivalent call can be expressed in terms of the put specifications K , τ l and Hτ l and the initial value of the underlying 1 Symmetry results in general market environments are also reported in Kholodnyi and Price (1998). Their

proofs are based on no-arbitrage arguments and use operator theory and group theory notions.

3. American Options: Symmetry Properties

69

asset S. For a down and out put option with barrier L which has characteristics τ L = inf{t ∈ [0, T ] : St = L} and Hτ L = 0 the equivalent up and out call has characteristics

KS ∗ ∗ ∗ ∗ and Hτ∗L = 0, τ L = τ L = inf t ∈ [0, T ] : St = L ≡ L where S ∗ denotes the price of the underlying asset in the auxiliary financial market. Contingent claims which are written on multiple assets also exhibit symmetry properties when their payoff is homogeneous of degree one. In fact the same change of measure argument as in the one asset case identifies classes of contracts which are related by symmetry and therefore can be priced off each other. In particular, for contracts on two underlying assets, we show that American call max-options are symmetric to American options to exchange the maximum of an asset and cash against another asset, that American exchange options are symmetric to standard call or put options (on a single underlying asset) and that American capped exchange options with proportional cap are symmetric to both capped call options with constant caps and capped put options with proportional caps. In all of these relationships the symmetric contract is valued in an auxiliary financial market with suitably adjusted interest rate and underlying asset prices. We then discuss extensions of the property to a class of contracts analyzed recently in the literature, namely occupation time derivatives. These contracts, typically, depend on the amount of time spent by the underlying asset price in certain prespecified regions of the state space. Examples of such path-dependent contracts are Parisian and cumulative barrier options (Chesney, Jeanblanc-Picqu´e and Yor (1997)), step options (Linetsky (1999)) and quantile options (Miura (1992)). More general payoffs based on the occupation time of a constant set, above or below a barrier, are discussed in Hugonnier (1998). While the literature has focused exclusively on European-style contracts in the context of models with geometric Brownian motion price processes, we consider American-style occupation time derivatives in models with Itˆo price processes. We also allow for occupation times of random sets. We show that occupation time derivatives with homogeneous payoff functions satisfy a symmetry property in which the symmetric contract depends on the occupation time of a suitably adjusted random set. Extensions to multiasset occupation time derivatives are also presented. Symmetry-like properties also hold when the contract under consideration is homogeneous of degree ν = 1. In this instance the interest rate in the auxiliary economy depends on the coefficient ν, the interest rate in the original economy and the dividend rate and volatility coefficients of the numeraire asset in the original

70

J. Detemple

economy. The dividend rates of other assets in the new numeraire are also suitably adjusted. Since symmetry properties reflect the passage to a new numeraire asset it is of interest to examine the replicability of attainable payoffs under changes of numeraire. For the case of nondividend paying assets Geman, El Karoui and Rochet (1995) have established that contingent claims that are attainable in one numeraire are also attainable in any other numeraire and that the replicating portfolios are the same. We show that these results extend to the case of dividend-paying assets. This demonstrates that any symmetric contract can indeed be attained in the appropriate auxiliary economy with new numeraire and that its price satisfies the usual representation formula involving the pricing measure and the interest rate that characterize the auxiliary economy. The second section reviews the property in the context of the standard model with constant coefficients. In Section 3 PCS is extended to a financial market model with Brownian filtration and stochastic opportunity set. The markovian model with diffusion price process (and general volatility structure) is examined as a subcase of the general model. Extensions to random maturity options, multiasset contingent claims, occupation time derivatives and payoffs that are homogeneous of degree ν are carried out in Sections 4–7. Questions pertaining to changes of numeraire, replicating portfolios and representation of asset prices are examined in Section 8. Concluding remarks are formulated last.

2 Put–call symmetry in the standard model We consider the standard financial market model with constant coefficients (constant opportunity set). The underlying asset price, S, follows a geometric Brownian motion process

z t ], t ∈ [0, T ]; S0 given d St = St [(r − δ)dt + σ d

(1)

where the coefficients (r, δ, σ ) are constant. Here r represents the interest rate, δ the dividend rate and σ the volatility of the asset price. The asset price process (1) is represented under the equivalent martingale measure Q: the process z is a Q-Brownian motion. In this complete financial market it is well known that the price of any contingent claim can be obtained by a no-arbitrage argument. In particular the value of a European call option with strike price K and maturity date T is given by the Black

3. American Options: Symmetry Properties

71

and Scholes (1973) formula c(St , K , r, δ, t) = St e−δ(T −t) N (d(St , K , r, δ, T − t))

√ −K e−r (T −t) N (d(St , K , r, δ, T − t) − σ T − t)

(2)

where d(S, K , r, δ, T − t) =

log(S/K ) + (r − δ + 12 σ 2 )(T − t) . √ σ T −t

(3)

Similarly the value of a European put with the same characteristics (K , T ) is √ p(St , K , r, δ, t) = K e−r (T −t) N (−d(St , K , r, δ, T − t) + σ T − t) − St e−δ(T −t) N (−d(St , K , r, δ, T − t)).

(4)

Comparison of these two formulas leads to the following symmetry property: Theorem 1 (European PCS) Consider European put and call options with identical characteristics K and T written on an asset with price S given by (1). Let p(S, K , r, δ, t) and c(S, K , r, δ, t) denote the respective price functions. Then p(S, K , r, δ, t) = c(K , S, δ, r, t).

(5)

Proof of Theorem 1 Substituting (K , S, δ, r ) for (S, K , r, δ) in (2) and using log(K /S) + (δ − r + 12 σ 2 )(T − t) √ σ T −t √ log(S/K ) + (r − δ + 12 σ 2 )(T − t) +σ T −t √ = − σ T −t √ (6) = −d(S, K , r, δ, T − t) + σ T − t

d(K , S, δ, r, T − t) =

gives the desired result. This result shows that the put value in the financial market under consideration is the same as the value of a call option with strike price S and maturity date T in an economy with interest rate δ and in which the underlying asset price follows a geometric Brownian motion process with dividend rate r , volatility σ and initial value K , under the risk neutral measure. This symmetry property between the value of puts and calls is even more striking when we consider American options. For these contracts (Kim (1990), Jacka (1991) and Carr, Jarrow and Myneni (1992)) have shown that the value of a call has the early exercise premium representation (EEP)

72

J. Detemple

C(St , K , r, δ, t, B c (·)) = c(St , K , r, δ, t) + π(St , K , r, δ, t, B c (·))

(7)

where C(S, K , r, δ, t, B c (·)) is the value of the American call, c(S, K , r, δ, t) represents the value of the European call in (2) and π (S, K , r, δ, t, B c (·)) is the early exercise premium

T

π (St , K , r, δ, t, B (·)) = c

t

φ(St , K , r, δ, v − t, Bvc )dv

(8)

with φ(St , K , r, δ, v − t, Bvc ) = δSt e−δ(v−t) N (d(St , Bvc , r, δ, v − t))

√ − r K e−r (v−t) N (d(St , Bvc , r, δ, v − t) − σ v − t). (9)

The exercise boundary B c (·) of the call option solves the recursive integral equation Btc − K = C(Btc , K , r, δ, t, B c (·))

(10)

subject to the boundary condition BTc = max(K , rδ K ). Let B c (K , r, δ, t) denote the solution. The EEP representation for the American put can be obtained by following the same approach as for the call. Alternatively the put value can be deduced from the call formula by appealing to the following result (McDonald and Schroder (1998)). Theorem 2 (American PCS) Consider American put and call options with identical characteristics K and T written on an asset with price S given by (1). Let P(S, K , r, δ, t, B p (·)) and C(S, K , r, δ, t, B c (·)) denote the respective price functions and B p (K , r, δ, ·) and B c (S, r, δ, ·) the corresponding immediate exercise boundaries. Then P(S, K , r, δ, t, B p (K , r, δ, ·)) = C(K , S, δ, r, t, B c (S, δ, r, ·))

(11)

and for all t ∈ [0, T ] B p (K , r, δ, t) =

SK B c (S, δ, r, t)

.

(12)

This result can again be demonstrated by substitution along the lines of the proof of Theorem 1. A more elegant approach relies on a change of measure detailed in the next section. Hence, even for American options the value of a put is the same as the value of a call with strike S, maturity date T , in an economy with interest rate δ and in which the underlying asset price, under the risk neutral measure, follows a geometric

3. American Options: Symmetry Properties

73

Brownian motion process with dividend rate r , volatility σ and initial value K . Furthermore the exercise boundary for the American put equals the inverse of the exercise boundary for the American call with characteristics (S, δ, r ) multiplied by the product S K . Some intuition for this result rests on the properties of normal distributions. In models with constant coefficients (r, δ, σ ) the value of put and call options can be expressed in terms of the cumulative normal distribution. Combining the symmetry of the normal distribution with the symmetry of the put and call payoffs leads to the relationship between the option values and the exercise boundaries. A priori this intuition may suggest that the property does not extend beyond the financial market model with constant coefficients. As we show next this conjecture turns out to be incorrect.

3 Put–call symmetry with Itˆo price processes In this section we demonstrate that a version of PCS holds under fairly general financial market conditions. The key to the approach is the adoption of the stock as a new numeraire. Changes of numeraire have been discussed thoroughly in the literature, in particular in Geman, El Karoui and Rochet (1995). The extension of options’ symmetry properties to general uncertainty structures based on this change of numeraire is due to Schroder (1999). This section considers a special case of Schroder, namely a market with Brownian filtration. Suppose we have an economy with finite time period [0, T ], a complete probability space (, F, P) and a filtration F(·) . A Brownian motion process z is defined on (, F) and takes values in R. The filtration is the natural filtration generated by z and FT = F. The financial market has a stochastic opportunity set and nonmarkovian price dynamics. The underlying asset price follows the Itˆo process, d St = St [(rt − δ t )dt + σ t d z t ], t ∈ [0, T ]; S0 given

(13)

under the Q-measure. The interest rate r , the dividend rate δ and the volatility coefficient σ are progressively measurable and bounded processes of the Brownian filtration F(·) generated by the underlying Brownian motion process z. The process z is a Q-Brownian motion. At various stages of the analysis we will also be led to consider an alternative financial market with interest rate δ, in which the underlying asset price S ∗ satisfies d St∗ = St∗ [(δ t − rt )dt + σ t dz t∗ ], t ∈ [0, T ]; S0∗ given

(14)

74

J. Detemple

under some risk neutral measure Q ∗ . In this market the asset has dividend rate r and volatility coefficient σ . The process z ∗ is a Brownian motion under the pricing measure Q ∗ . Both z ∗ and Q ∗ will be specified further as we proceed. We first state a relationship between the values of European puts and calls in the general financial market model under consideration. Theorem 3 (Generalized European PCS) Consider a European put option with characteristics K and T written on an asset with price S given by (13) in the market with stochastic interest rate r . Let p(S, K , r, δ; Ft ) denote the put price process. Then p(St , K , r, δ; Ft ) = c(St∗ , S, δ, r ; Ft )

(15)

where c(St∗ , S, δ, r ; Ft ) is value of a call with strike price S = St and maturity date T in a financial market with interest rate δ and in which the underlying asset price follows the Itˆo process (14) for v ∈ [t, T ] with initial value St∗ = K and with z ∗ defined by z v + σ v dv dz v∗ = −d

(16)

for v ∈ [0, T ], with z 0∗ = 0. This result extends the PCS property of the previous section to nonmarkovian economies with Itˆo price processes and progressively measurable interest rates. The key behind this general equivalence is a change of measure, detailed in the proof, which converts a put option in the original economy into a call option with symmetric characteristics in the auxiliary economy. Note that the equivalence is obtained by switching (S, K , r, δ) to (S ∗ , S, δ, r ), but keeping the trajectories of the Brownian motion the same, i.e. the filtration which is used to compute the value of the call in the auxiliary financial market is the one generated by the original Brownian motion z. Thus information is preserved across economies. In effect the change of measure creates a new asset whose price is the inverse of the original asset price adjusted by a multiplicative factor which depends only on the initial conditions. As we shall see below in the context of diffusion models the change of measure is instrumental in proving the symmetry property without placing restrictions on the volatility coefficient. Proof of Theorem 3 In the original financial market the value pt ≡ p(St , K , r, δ; Ft ) of the put option with characteristics (K , T ) has the (present value) representation T + T T exp − pt = E rv dv K − St exp α v dv + σ v d zv | Ft t

t

t

3. American Options: Symmetry Properties

75

where α ≡ r − δ − 12 σ 2 and the expectation is taken relative to the equivalent martingale measure Q. Simple manipulations show that the right hand side of this equation equals T T 1 2 σ v d zv E exp − δ v + σ v dv + 2 t t + ! T T α v dv − σ v d z v − St | Ft . × K exp − t

t

Consider the new measure T 1 T 2 σ dv + σ v d zv d Q d Q = exp − 2 0 v 0 ∗

(17)

which is equivalent to Q. Girsanov’s Theorem (1960) implies that the process z v + σ v dv dz v∗ = −d

(18)

is a Q ∗ -Brownian motion. Substituting (18) in the put pricing formula and passing to the Q ∗ -measure yields T T 1 ∗ δ v dv K exp (δ v − rv − σ 2v ) dv pt = E exp − 2 t t + ! T σ v dz v∗ − St | Ft . (19) + t

But the right hand side is the value of a call option with strike S = St , maturity date T in an economy with interest rate δ, asset price with dividend rate r and initial value St∗ = K , and pricing measure Q ∗ . An even stronger version of the preceding result is obtained if the coefficients ∗ of the model are adapted to the subfiltration generated by the process z ∗ . Let F(·) ∗ denote the filtration generated by this Q -Brownian motion process. ∗ . Corollary 4 Suppose that the coefficients (r, δ, σ ) are adapted to the filtration F(·) Then

p(St , K , r, δ; Ft ) = c(St∗ , S, δ, r ; Ft∗ ) where c(St∗ , S, δ, r ; Ft∗ ) is value of a call with strike price S = St and maturity ∗ generated by the Q ∗ date T in a financial market with information filtration F(·) Brownian motion process (16), interest rate δ and in which the underlying asset price follows the Itˆo process (14) with initial value St∗ = K .

76

J. Detemple

In the context of this corollary part of the information embedded in the original information filtration generated by the Brownian motion z may be irrelevant for pricing the put option. Since all the coefficients are adapted to the subfiltration generated by z ∗ this is the only information which matters in computing the expectation under Q ∗ in (19). Remark 5 Note that the standard European PCS in the model with constant coefficients is a special case of this corollary. Indeed in this setting direct integration over z ∗ leads to the call value in the auxiliary economy and the put value in the original economy. Let us now consider the case of American options. For these contracts early exercise, prior to the maturity date T , is under the control of the holder. At any time prior to the optimal exercise time the put value Pt ≡ P(St , K , r, δ; Ft ) in the original economy is (see Bensoussan (1984) and Karatzas (1988)) τ τ 1 exp − rv dv K − St exp (rv − δ v − σ 2v ) dv Pt = sup E 2 t t τ ∈St,T + ! τ σ v d zv | Ft + t

where St,T denotes the set of stopping times of the filtration F(·) with values in [t, T ]. Using the same arguments as in the proof of Theorem 3 we can write τ τ 1 ∗ δ v dv K exp (δ v − rv − σ 2v ) dv Pt = sup E exp − 2 τ ∈St,T t t + ! τ σ v dz v∗ − St | Ft + t

where the expectation is relative to the equivalent measure Q ∗ and conditional on the information Ft . Since the change of measure performed does not affect the set of stopping times over which the holder optimizes the following result holds. Theorem 6 (Generalized American PCS) Consider an American put option with characteristics K and T written on an asset with price S given by (13) in the market with stochastic interest rate r . Let P(S, K , r, δ; Ft ) denote the American put price process and τ p (K , r, δ) the optimal exercise time. Then, prior to exercise, the put price is P(St , K , r, δ; Ft ) = C(St∗ , S, δ, r ; Ft )

(20)

where C(St∗ , S, δ, r ; Ft ) is the value of an American call with strike price S = St and maturity date T in a financial market with interest rate δ and in which the

3. American Options: Symmetry Properties

77

underlying asset price follows the Itˆo process (14) with initial value St∗ = K and with z ∗ defined by (16). The optimal exercise time for the put option is τ p (S, K , r, δ) = τ c (K , S, δ, r )

(21)

where τ c (K , S, δ, r ) denotes the optimal exercise time for the call option. Remark 7 Consider the model with constant coefficients (r, δ, σ ). In this setting the optimal exercise time for the call option in the auxiliary financial market is

1 2 c ∗ c τ (K , S, δ, r ) = inf t ∈ [0, T ] : K exp δ − r − σ t + σ z t = B (S, δ, r, t) . 2 On the other hand the optimal exercise time for the put option in the original financial market is

1 2 p p z t = B (K , r, δ, t) τ (S, K , r, δ) = inf t ∈ [0, T ] : S exp r − δ − σ t + σ 2 where B p (K , r, δ, t) is the put exercise boundary. Using the definition of z ∗ in (16) we conclude immediately that B p (K , r, δ, t) =

SK B c (S, δ, r, t)

.

3.1 Diffusion financial market models Suppose that the stock price satisfies the stochastic differential equation d St = St [(r (St , t) − δ(St , t))dt + σ (St , t)d z t ], t ∈ [0, T ]; S0 given

(22)

under the Q-measure. In this market the interest rate r may depend on the stock price and along with the other coefficients of (22) satisfies appropriate Lipschitz and growth conditions for the existence of a unique strong solution (see Karatzas and Shreve (1988)). We assume that the solution is continuous relative to the initial conditions. Since this markovian financial market is a special case of the general model of the previous section PCS holds. However, in the model under consideration the exercise regions of options have a simple structure which leads to a clear comparison between the put and the call exercise policies. Define the discount factor s r (Sv , v)dv Rt,s = exp − t

for t, s ∈ [0, T ] and the Q-martingale

78

J. Detemple

s 1 s Mt,s ≡ exp − σ (Sv , v)2 dv + σ (Sv , v)d zv 2 t t for t, s ∈ [0, T ], s ≥ t. Consider an American call option and let E denote the exercise set. Continuity of the strong solution of (22) relative to the initial conditions implies that the option price is continuous and that the exercise region is a closed set. Thus we can meaningfully define its boundary B c .2 Let E(t) denote the t-section of the exercise region. The EEP representation for a call option with strike K and maturity date T is C(St , K , r, δ, t, B c (·)) = c(St , K , r, δ, t) + π(St , K , r, δ, t, B c (·))

(23)

where C(S, K , r, δ, t, B c (·)) is the value of the American call, c(S, K , r, δ, t) represents the value of the European call c(St , K , r, δ, t) = E

St exp −

T

δ(Sv , v)dv Mt,T − K Rt,T

+

! | St

(24)

t

and π t ≡ π (St , K , r, δ, t, B c (·)) is the early exercise premium s T δ(Sv , v)dv Mt,s δ(Sv , v)St exp − πt = E t t ! − r (Ss , s)K Rt,s 1{Ss ∈E (s)} ds | St .

(25)

In these expressions dependence on r and δ is meant to represent dependence on the functional form of r (·) and δ(·). The boundary B c (·) of the exercise set for the call option solves the recursive integral equation Btc − K = C(Btc , K , r, δ, t, B c (·))

(26)

subject to the boundary condition BTc = max(K , (r (BTc , T )/δ(BTc , T ))K ). Let B c (K , r, δ, t) denote the solution. The optimal exercise policy for the call is to exercise at the stopping time τ (S, K , r, δ) = inf t ∈ [0, T ] : c

−1 S R0,t

t

c exp − δ(Sv , v)dv M0,t = B (K , r, δ, t) . 0

(27)

2 If the exercise region is up-connected the exercise boundary is unique. Failure of this property may imply the

existence of multiple boundaries.

3. American Options: Symmetry Properties

79

In this context put–call symmetry leads to Proposition 8 Consider an American put option with characteristics K and T written on an asset with price S given by (22) in the market with interest rate r (S, t). Let P(S, K , r, δ, t) denote the American put price process and τ p (S, K , r, δ) the optimal exercise time. Then, prior to exercise, the put price is P(St , K , r, δ, t) = C(St∗ , S, δ, r, t)

(28)

where C(St∗ , S, δ, r ; t) is value of an American call with strike price S = St and maturity date T in a financial market with stochastic interest rate δ and in which the underlying asset price S ∗ satisfies the stochastic differential equation ! SK SK SK , v − r , v dv + σ , v dz v∗ , for v ∈ [t, T ] d Sv∗ = Sv∗ δ Sv∗ Sv∗ Sv∗ (29) ∗ ∗ with initial value St = K and with z defined by (16). The optimal exercise time for the put option is τ p (S, K , r, δ) = τ c (K , S, δ, r ) and the exercise boundaries are related by B p (K , r, δ, t) =

SK B c (S, δ, r, t)

.

(30)

In the financial market setting of (22) all the information relevant for future payoffs is embedded in the current stock price. Any strictly monotone transformation of the price is also a sufficient statistic. Thus the passage from the original economy to the auxiliary economy with stock price (29) preserves the information required to price derivatives with future payoffs. No information beyond the current price St∗ is required to assess the correct evolution of the coefficients of the underlying asset price process. This stands in contrast with the general model with Itˆo price processes in which the path of the Brownian motion needs to be recorded in the auxiliary economy for proper evaluation of future distributions. Note also that the change of measure converts the original underlying asset into a symmetric asset with inverse price up to a multiplicative factor depending only on the initial conditions. Since the change of measure can be performed independently of the structure of the coefficients the results are valid even in the absence of symmetry-like restrictions on the volatility coefficient. Proof of Proposition 8 The first part of the proposition follows from Theorem 6. To prove the relationship between the exercise boundaries note that the call boundary at maturity equals B c = max(K , bc )

80

J. Detemple

where bc solves the nonlinear equation SK SK c ,T b −δ , T S = 0. r bc bc In this expression we used the relation ST = S K /ST∗ . Now with the change of variables b p = S K /bc it is clear that b p solves r (b p , T )K − δ(b p , T )b p = 0 and that the put boundary at the maturity date satisfies (30). To establish the relation prior to the maturity date it suffices to use the recursive integral equation for the call boundary, pass to the Q ∗ -measure and perform the change of variables indicated. The resulting expression is the recursive integral equation for the put boundary. The results in this section can be easily extended to multivariate diffusion models (S, Y ) where Y is a vector of state variables impacting the coefficients of the underlying asset price process. Passage to the measure Q ∗ , in this case, introduces a risk premium correction in the state variables processes. Multivariate models in that class are discussed extensively in Schroder (1999).

4 Options with random expiration dates We now consider a class of American derivatives which mature automatically if certain prespecified conditions are satisfied. Let τ l denote a stopping time of the filtration and let H = {Ht : t ∈ [0, T ]} denote a progressively measurable process. A call option with maturity date T , strike K , automatic liquidation time τ l and liquidation payoff H pays (S − K )+ if exercised by the holder at date t < τ l . If τ l materializes prior to T the option automatically matures and pays off Hτ l . A random maturity put option with characteristics (K , T, τ l , H ) has similar provisions but pays (K − S)+ if exercised prior to the automatic liquidation time τ l . Options with such characteristics are referred to as random maturity options. Popular examples of such contracts are barrier options such as down and out put options and up and out call options. Both of these contracts become worthless when the underlying asset price reaches a prespecified level L (i.e. the liquidation payoff is a constant H = 0). Another example is an American capped call option with automatic exercise at the cap L. This option is automatically liquidated at the random time τ l = τ L ≡ inf{t ∈ [0, T ] : St = L} or τ L = ∞ if no such time materializes in [0, T ] and pays off the constant H =

3. American Options: Symmetry Properties

81

L − K in that event. If τ L > T the option payoff is (S − K )+ .3 Capped options with growing caps and automatic exercise at the cap are examples in which the automatic liquidation payoff is time dependent Consider again the general financial market model with underlying asset price given by (13). Recall the definitions of the discount factor s rv dv Rt,s = exp − t

for t, s ∈ [0, T ] and the Q-martingale s 1 s 2 σ dv + σ v d zv Mt,s ≡ exp − 2 t v t for t, s ∈ [0, T ], s ≥ t. Let Pt = P(S, K , T, τ l , H, r, δ; Ft ) denote the value of an American random maturity put with characteristics (K , T, τ l , H ). In this financial market the put value is given by + τ −1 Pt = sup E Rt,τ K − St Rt,τ exp − δ v dv Mt,τ 1{τ 0 where A± (L) is defined above. Again the PCS relation (39) holds in this case. Put and call step options are special cases of the occupation time derivatives in which the payoff function involves exponential discounting. Closed form solutions are provided by Linetsky for geometric Brownian motion price process. Occupation time derivatives can be easily generalized to the multiasset case. For a progressively measurable stochastic closed set A ∈ Rn+ and a vector of asset prices S ∈ B(Rn+ ) a multiasset f -claim has payoff f (S, K , O S,A ) where t S,A 1{Sv ∈Av } dv, t ∈ [0, T ]. Ot = 0

A natural generalization of Theorem 13 is Theorem 15 Consider an American occupation time f -claim with maturity date T and a payoff function f (S, K , O S,A ) which is homogeneous of degree one in (S, K ). Let V (S, K , O S,A , r, δ; Ft ) denote the value of the claim in the financial market with filtration F(·) , asset prices S satisfying (37) and progressively measurable interest rate r . Pick some arbitrary index j and define K r and λ j (δ) ≡ j . j S δ Prior to exercise the value of the multiasset occupation time f -claim is λj ≡

V (St , K , O S,A , r, δ; Ft ) = V j (St∗ , S j , O S

∗ ,A∗

, δ j , λ j (δ) ◦ j δ; Ft )

where A∗ = {A∗ (v, ω), v ∈ [t, T ]} with A∗ (v, ω) = {x ∈ Rn+ : xi = yi S/y j , for ∗ ∗ i = j, x j = K S/y j and y = (y1 , . . . , yn ) ∈ A(v, ω)} and OtS ,A ≡ OtS,A . Also ∗ ∗ V j (St∗ , S j , OtS ,A , δ j , λ j (δ) ◦ j δ; Ft ) is the value of the f j -claim with parameter ∗ ∗ j S j = St , maturity date T and occupation time OtS ,A in an auxiliary financial market with interest rate δ j and in which the underlying asset prices follow the Itˆo processes " d Svi∗ = Svi∗ [(δ vj − δ iv )dv + (σ vj − σ iv )dz vj∗ ]; for i = j and v ≥ t d Svj∗ = Svj∗ [(δ vj − rv )dv + σ vj dz vj∗ ]; for i = j and v ≥ t with respective initial conditions Si for j = i and K for j = i. The process z j∗ is defined by

z v + σ vj dv dz vj∗ = −d

3. American Options: Symmetry Properties

93

j∗

for all v ∈ [0, T ], z 0 = 0. The optimal exercise time for the f -claim is the same as the optimal exercise time for the f j -claim in the auxiliary financial market. Some particular cases are the natural counterpart of standard multiasset options. 1. Cumulative barrier max- and min-options: When there are two underlying assets call options in this category have payoff functions of the form (St1 ∨ St2 − K )+ 1{O S,A ≥b} (max-option) or (St1 ∧ St2 − K )+ 1{O S,A ≥b} (min-option), where t t b ∈ [0, T ]. Similarly for put options. It is easily verified that a cumulative barrier call max-option is symmetric to a cumulative barrier option to exchange the maximum of an asset and cash against another asset for which the occupation time has been adjusted. 2. Cumulative barrier exchange options: The payoff function takes the form (S 1 − S 2 )1{O S,A ≥b} . This exchange option is symmetric to cumulative barrier call and t put options with suitably adjusted occupation times. 3. Quantile options (Miura (1992), Akahori (1995), Dassios (1995)): An αquantile call option pays off (M(α, t) − K ) upon exercise where M(α, t) = − t inf{x : 0 1{Sv ≤x} dv > αt} = inf{x : OtS,A (x) > αt}. Consider an α-quantile strike put with payoff (M(α, t) − St ). Note that t

t M(α, t) = inf x : 1{Sv ≤x} dv > αt = inf{x : 1{SSv /St ≤Sx/St } dv > αt} 0 0 t 1{SSv /St ≤y} dv > αt} ≡ (St /S)M ∗ (α, t) = (St /S) inf{y : 0

∗

∗ where M (α, t) is the α-quantile of the normalized price Sv,t ≡ SSv /St for ∗ v ≤ t. Thus M(α, t) = (St /S)M (α, t) and an α-quantile strike put is seen to be symmetric to an α-quantile call option with (fixed) strike price S and quantile ∗ based on the normalized asset price Sv,t , v ≤ t.

Multiasset step options can be also be defined in a natural manner and satisfy symmetry properties akin to those of standard multiasset options.

7 Symmetry property without homogeneity of degree one Several derivative securities have payoffs that are not homogeneous of degree one. Examples include digital options and quantile options (homogeneous of degree ν = 0) or product options (homogeneous of degree ν = 0, 1). Product options (options on a product of assets) include options on foreign indices with payoff in domestic currency such as quanto options. As we show below, even in these cases, symmetry-like properties link various types of contracts.

94

J. Detemple

Consider an f -claim on n underlying assets whose payoff is homogeneous of degree ν, i.e., f (λS, λK ) = λν f (S, K ) for some ν ≥ 0 and for all λ > 0. The following result is then valid. Theorem 16 Consider an American f -claim with maturity date T and a continuous and homogeneous of degree ν payoff function f (S, K ). Let V (S, K , r, δ; Ft ) denote the value of the claim in the financial market with filtration F(·) , asset prices St satisfying (37) and progressively measurable interest rate r . For j = 1, . . . , n, define 1 r j∗ = (1 − ν)r + νδ j + ν(1 − ν)σ j σ j 2 1 δ i∗ = (1 − ν)r + δ i + (ν − 1)δ j + (1 − ν) −1 + ν σ j σ j + (1 − ν)σ i σ j , 2 for i = j 1 δ j∗ = (2 − ν)r + (ν − 1)δ j + (1 − ν) −1 + ν σ j σ j . 2 Prior to exercise the value of the claim is, for any j = 1, . . . , n, V (St , K , r, δ; Ft ) = V j (St∗ , S j , r j∗ , δ ∗ ; Ft ) where V j (St∗ , S j , r j∗ , δ ∗ ; Ft ) is the value of the f j -claim with parameter S j and maturity date T in an auxiliary financial market with interest rate r j∗ and in which the underlying asset prices follow the Itˆo processes " j i j∗ d Svi∗ = Svi∗ [(rvj∗ − δ i∗ v )dv + (σ v − σ v )dz v ]; for i = j and v ∈ [t, T ] d Svj∗ = Svj∗ [(rvj∗ − δ vj∗ )dv + σ vj dz vj∗ ]; for i = j and v ∈ [t, T ] ∗j

with respective initial conditions St∗i = S i for i = j and St = K for i = j. The process z j∗ is defined by j∗

dz vj∗ = −d z v + νσ vj dv, for v ∈ [0, T ]; z 0 = 0. The optimal exercise time for the f -claim is the same as the optimal exercise time for the f j -claim in the auxiliary financial market. j

Proof of Theorem 16 Define S j = St . Let 1 rvj∗ = (1 − ν)rv + νδ vj + ν(1 − ν)σ vj σ vj 2

3. American Options: Symmetry Properties

95

and note that τ j ν Sτ exp − rv dv Sj t T T 1 2 T j j j∗ j = exp − rv dv exp − ν σ v σ v dv + ν σ v d zv . 2 t t t Defining the equivalent measure T 1 2 T j j j∗ j σ v σ v dv + ν σ v d zv d Q d Q = exp − ν 2 0 0 enables us to write V (St , K , r, δ; Ft ) = = = =

exp − sup E

τ ∈St,T

sup E exp − sup E j∗

τ ∈St,T

sup E j∗

τ ∈St,T

= V

t

ν ! Sτj Sj Sj rv dv f Sτ j , K j |Ft Sj Sτ Sτ t τ ! j S j∗ j∗ rv dv f Sτ j , Sτ |Ft exp − Sτ t τ ! j∗ j ∗ j |F rv dv f (Sτ , S ) t exp −

τ ∈St,T

j

! rv dv f (Sτ , K ) |Ft

τ

τ

t

(St∗ ,

∗j

∗

S , r , δ ; Ft ). j

Under Q j∗ the process z v + νσ vj dv dz vj∗ = −d is a Brownian motion and S i∗ satisfies, for i = j and v ∈ [t, T ] zv ] d Svi∗ = Svi∗ [(δ vj − δ iv + (σ vj − σ iv )σ vj )dv − (σ vj − σ iv )d = Svi∗ [(δ vj − δ iv + (σ vj − σ iv )σ vj )dv + (σ vj − σ iv )[dz vj∗ − νσ vj dv]] = Svi∗ [(δ vj − δ iv + (1 − ν)(σ vj − σ iv )σ vj )dv + (σ vj − σ iv )dz vj∗ ] j i j∗ = Svi∗ [(rvj∗ − δ i∗ v )dv + (σ v − σ v )dz v ]

where δ i∗ v

= (1 − ν)rv +

δ iv

+ (ν −

1)δ vj

1 + (1 − ν) −1 + ν σ vj σ vj + (1 − ν)σ iv σ vj 2

and for i = j and v ∈ [t, T ] zv ] d Svj∗ = Svj∗ [(δ vj − rv + σ vj σ vj )dv − σ vj d = Svj∗ [(δ vj − rv + (1 − ν)σ vj σ vj )dv + σ vj dz vj∗ ] = Svj∗ [(rvj∗ − δ vj∗ )dv + σ vj dz vj∗ ]

96

J. Detemple

where δ vj∗

1 = (2 − ν)rv + (ν − 1)δ vj + (1 − ν) −1 + ν σ vj σ vj . 2

This completes the proof of the theorem. Remark 17 When the claim is homogeneous of degree 1 the interest rate and the i dividend rates in the economy with numeraire j become r vj∗ = δ vj , δ i∗ v = δ v , for j∗ i = j, and δ v = rv . Thus we recover the prior results of Theorem 13. Another special case of interest is when the payoff function is homogeneous of degree 0. The economy with numeraire j then has characteristics r j∗ = r δ i∗ = r + δ i − δ j − (σ j − σ i )σ j , for i = j δ j∗ = 2r − δ j − σ j σ j and the underlying asset prices follow the Itˆo processes " j i j∗ d Svi∗ = Svi∗ [(rvj∗ − δ i∗ v )dv + (σ v − σ v )dz v ]; for i = j and v ∈ [t, T ] d Svj∗ = Svj∗ [(rvj∗ − δ vj∗ )dv + σ vj dz vj∗ ]; for i = j and v ∈ [t, T ] ∗j

with respective initial conditions St∗i = S i for i = j and St = K for i = j. The process z j∗ is defined by dz vj∗ = −d z v , for v ∈ [0, T ]. It is a Brownian motion ∗ under Q = Q. Examples of contracts in this category are 1. Digital options: A digital call option ( f (S, K ) = 1{S≥K } ) is symmetric to a digital put option with strike S = St , written on an asset with dividend rate δ ∗ = 2r − δ − σ 2 , in an economy with interest rate r ∗ = r . 2. Digital multiasset options: A digital call max-option ( f (S 1 , S 2 , K ) = 1{S 1 ∨S 2 ≥K } ) is symmetric to a digital option to exchange the maximum of an asset and cash against another asset ( f 2 (S 1 , S 2 , K ) = 1{S∗1 ∨K ≥S∗2 } , where K = S 2 ) in the economy with asset j = 2 as numeraire (with characteristics r 2∗ = r, δ 1∗ = r + δ 1 − δ 2 − (σ 2 − σ 1 )σ 2 , and δ 2∗ = 2r − δ 2 − σ 2 σ 2 ). A digital call min-option ( f (S 1 , S 2 , K ) = 1{S1 ∧S 2 ≥K } ) is symmetric to a digital option to exchange the minimum of an asset and cash against another asset ( f 2 (S 1 , S 2 , K ) = 1{S∗1 ∧K ≥S∗2 } , where K = S 2 ) in the same auxiliary economy. Similar relations hold for digital multiasset put options. 3. Cumulative barrier digital options: Symmetry properties for occupation time derivatives with homogeneous of degree zero payoffs can be easily identified by drawing on the previous section. A cumulative barrier digital call op+ tion with barrier L (i.e. payoff f (S, K , O S,A (L) ) = 1{S≥K } 1{O S,A+ (L) ≥b} where t

3. American Options: Symmetry Properties

97

A+ (L) = {x ∈ R+ : (x − L)+ ≥ 0}) is symmetric to a cumulative barrier digital ∗ − ∗ put option with barrier L ∗ = K S/L (i.e. payoff f 1 (S ∗ , K , O S ,A (L ) ) = 1{K ≥S ∗ } 1{O S∗ ,A− (L ∗ ) ≥b} where K = S and A− (L ∗ ) = {x ∈ R+ : (x −L ∗ )− ≥ 0}). t A similar symmetry relation can be established for Parisian digital call and put options. 4. Quanto options: Consider again the quanto call option with payoff e(S − K )+ in foreign currency where e is the Y/$ exchange rate. From the foreign perspective the contract is homogeneous of degree ν = 2 in the triplet (e, S, K ). The results of Theorem 16 imply that the quanto call is symmetric to an exchange option in an economy with interest rate r f ∗ = −r f + 2r − σ e σ e and which underlying assets have dividend rates δ 1∗ = −r f + δ + r − σ σ e δ 2∗ = r. The call value can be written

f ∗ exp − C tQ = et sup E τ ∈St,T

where

"

τ t

! rvf ∗ dv (Sτ1∗ − Sτ2∗ )+ |Ft

e f∗ d Sv1∗ = Sv1∗ [(rvf ∗ − δ 1∗ v )dv + (σ v − σ v )dz v ]; for v ∈ [t, T ] e f∗ d Sv2∗ = Sv2∗ [(rvf ∗ − δ 2∗ v )dv + σ v dz v ]; for v ∈ [t, T ],

with the initial conditions St1∗ = St and St2∗ = K . An alternative representation for the quanto call was provided in Section 7. Remark 18 Representation formulas involving the change of measure introduced in earlier sections can also be obtained with payoffs that are homogeneous of degree ν. In this case the coefficients of the underlying asset price processes reflect j the homogeneity degree of the payoff function. Indeed letting S j = St we can always write τ ! |F rv dv f (Sτ , K ) t V (St , K , r, δ; Ft ) = sup E exp − τ ∈St,T

=

sup E exp −

τ ∈St,T

t

t

τ

Sτj rv dv Sj j 1/ν S

j 1/ν S ,K × f Sτ j j Sτ Sτ

! |Ft

98

J. Detemple

=

sup E

τ ∈St,T

j∗

exp −

τ t

δ vj dv

f ( Sτ , Sτn+1 ) |Ft

!

Svn+1 = K ( S j )1/ν for v ∈ [t, T ]. The where Svi = Svi ( S j )1/ν for i = 1, . . . , n and j

j

Sv

Sv

auxiliary economy has interest rate δ j and the equivalent measure Q j∗ is T 1 T j j d Q j∗ = exp − σ v σ v dv + σ vj d z v d Q. 2 0 0 z v + σ vj dv, for v ∈ [0, T ] is a Q j∗ -Brownian motion The process dz vj∗ = −d process.

8 Changes of numeraire and representation of prices In the financial markets of the previous sections the price of a contingent claim is the expectation of its discounted payoff where discounting is at the riskfree rate and the expectation is taken under the risk neutral measure. This standard representation formula is implied by the ability to replicate the claim’s payoff using a suitably constructed portfolio of the basic securities in the model. Since symmetry properties are obtained by passing to a new numeraire a natural question is whether contingent claims that are attainable in the basic financial markets are also attainable in the economy with new numeraire. This question is in fact essential for interpretation purposes since the symmetry properties above implicitly assume that the renormalized claims can be priced in the new numeraire economy and that their price corresponds to the one in the original economy. For the case of nondividend paying assets Geman, El Karoui and Rochet (1995) prove that contingent claims that are attainable in one numeraire are also attainable in any other numeraire and that the replicating portfolios are the same. Our next theorem provides an extension of this result to dividend-paying assets. The framework of section 2 with Brownian filtration is adopted for convenience only; the results are valid for more general filtrations. Theorem 19 Consider an economy with Brownian filtration and complete financial market with n risky assets and one riskless asset. Suppose that risky assets pay dividends and that their prices follow Itˆo processes (37), and that the riskless asset pays interest at the rate r . Assume that all the coefficients are progressively measurable and bounded processes. If a contingent claim’s payoff is attainable in a given numeraire then it is also attainable in any other numeraire. The replicating portfolio is the same in all numeraires. Proof of Theorem 19 Let i = 0 denote the riskless asset. The gains from trade in

3. American Options: Symmetry Properties

99

the primary assets are dG it

≡ d Sti + Sti δit dt = Sti [rt dt + σ it d z t ], for i = 1, . . . , n

dG 0t ≡ d Bt = Bt rt dt, for i = 0. For i = 0, . . . , n, gains from trade expressed in numeraire j are t 1 i i Sti i, j Gt = j + δ S dv j v v St 0 Sv so that i, j dG t

1

Sti d

1

1 = + +d S , j + j S S ! t 1 1 1 + d Si , j . = j dG it + Sti d j S t St St d Sti j St

1

S i δ i dt j t t St

(40) !

i

t

Now let π i represent the amount invested in asset i and consider a portfolio T (π 0 , π) ∈ Rn+1 such that 0 π v σ v σ v π v dv < ∞, (P-a.s.). The wealth process X generated by N , where N j = π j /S j , j = 0, . . . , n represents the number of shares of each asset in the portfolio, satisfies d Xt =

n

Nti dG it

i=0

n

and X t = (this portfolio is self financing since all dividends are reinvested). Using Itˆo’s lemma gives i ! n n 1 Xt i dG t i i 1 Nt Nt d G , j d = + Xt d + j j j S t St St St i=0 i=0 ! n i 1 1 dG t = Nti + Sti d + d Si , j j j S t St St i=0 n i, j = Nti dG t i i i=0 Nt St

i=0

i.e. the normalized wealth process can be synthesized in the new numeraire economy in which all asset prices have been deflated by the numeraire asset j. Furthermore the investment policy which achieves normalized wealth is the same as in the original economy. Consequently, any deflated payoff is attainable in the new numeraire economy when the (undeflated) payoff is attainable in the original economy. Remark 20 (i) The proper definition of gains from trade in the new numeraire is instrumental in the proof above. Since dividends are paid over time they must be

100

J. Detemple

deflated at a discount rate which reflects the timing of the cash flows. This explains the discount factor inside the integral of dividends in (40). (ii) Note that Theorem 19 applies even if the numeraire chosen is a portfolio of assets or any other progressively measurable process instead of one of the primitive assets. It also applies when the portfolio is not self financing, for example when there are infusions or withdrawal of funds over time. (iii) The results above apply for payoffs that are received at fixed time as well as stopping times of the filtration: if there exists a trading strategy that attains the random payoff X τ where τ ∈ S0,T in the original financial market then the normalized payoff X τ /Sτj is attainable in the economy with numeraire asset j. Our next result now follows easily from the above. Theorem 21 Suppose that asset j serves as numeraire and that S j satisfies (37). Define the probability measure Q j∗ by

T j exp(− 0 (rv − δ v )dv)ST j∗ dQ = dQ j S0 T 1 T j j j σ σ dv + σ d z (41) = exp − v dQ v 2 0 v v 0 and consider the discount rate δ j . Then the discounted prices of primary securities expressed in numeraire j are Q j∗ -supermartingales (discounted gains from trade in numeraire j are Q j∗ -martingales) and the price of any attainable security in the original economy can be represented as the expected discounted value of its cash flows expressed in numeraire j where the discount rate is δ j and the expectation is under the Q j∗ -measure. Proof of Theorem 21 Using definition (40) of gains from trade expressed in numeraire j and Itˆo’s lemma gives ! 1 1 1 i i i, j i i i 1 dG t = d S + S d S δ dt + d S , + t t j j j t t Sj t St St St 1 i 1 j j j j = S [r dt + σ it d z t ] + Sti j [(δ t − rt + σ t σ t )dt − σ t d zt ] j t t St St 1 j −Sti j σ it σ t dt St 1 i j j j j S [(δ t + (σ t − σ it )σ t )dt + (σ it − σ t )d zt ] = j t St 1 i j j j∗ S [δ dt + (σ t − σ it )dz t ], = j t t St

3. American Options: Symmetry Properties j∗

101

j

where dz t = −d z t + σ t dt is a Q j∗ -Brownian motion process. Defining Sti∗ = j Sti /St we can then write j

j

j∗

d Sti∗ = Sti∗ [(δ t − δ it )dt + (σ t − σ it )dz t ]

t i.e. the discounted price of asset i in numeraire j, exp(− 0 δ vj dv)Sti∗ , is a Q j∗ supermartingale where discounting is at the rate δ j . Alternatively the discounted gains from trade process t v t j i∗ j exp − δ v dv St + exp − δ u du Svi∗ δ iv dv 0

0

0

j∗

is a Q -martingale. Thus, we can write the representation formula v T ! T j∗ i∗ j i∗ j i∗ i St = E t exp − δ v dv ST + exp − δ u du Sv δ v dv |Ft . t

t

t

The relations satisfied by primary asset prices also apply to portfolios of primary assets and therefore to any contingent claim that is attainable. This completes the proof of the theorem. Remark 22 When a dividend-paying primary asset price is chosen as deflator the auxiliary economy has an interest rate equal to the dividend rate of the deflator. In this new numeraire cash is converted into an asset that pays a dividend rate equal to the interest rate in the original economy. If we choose the discounted price t j j St = exp(− 0 (rv − δ vj )dv)St , which is a martingale, as numeraire the process j St satisfies Sti∗ = Sti / j

j∗

d Sti∗ = Sti∗ [(rt − δ iv )dt + (σ t − σ it )dz t ] and its discounted value at the riskfree rate is a Q j∗ -supermartingale where Q j∗ is defined in (41). With this choice of numeraire the interest rate remains unchanged in the auxiliary economy. Cash is converted into an asset that pays a dividend rate equal to the interest rate and thus has null drift (martingale). Remark 23 (i) Note that a payoff expressed in a new numeraire is not necessarily the same as the payoff evaluated at normalized underlying asset prices (i.e. prices expressed in the new numeraire). There is clearly equivalence when the payoff is homogeneous of degree one. With homogeneity of degree ν the payoff in the new numeraire is equivalent to the payoff function evaluated at underlying asset prices that are normalized by a power of the numeraire price. Normalized asset prices (in the payoff function) then differ from asset prices expressed in the new numeraire. (ii) A byproduct of Theorem 21 is a generalized “symmetry” property which applies to any payoff function. In this interpretation of the property the symmetric contract is simply the payoff expressed in the new numeraire.

102

J. Detemple

Some extensions are worth mentioning. Remark 24 Note that the results on the replication of attainable contingent claims, their financing portfolios and their representation under new measures are valid even when markets are incomplete. Indeed if the claims under consideration can be replicated in a given incomplete market equilibrium (i.e. if the claims’ payoffs live in the asset span) so can they under a change of numeraire. The results are also valid when the market is effectively complete (single agent economies). In this case even when claims payoffs cannot be duplicated they have a unique price which can be expressed in different forms corresponding to various choices of numeraire.

9 Conclusion In this paper we have reviewed and extended recent results on PCS. Features of the models considered include (i) financial markets with progressively measurable coefficients, (ii) random maturity options, (iii) options on multiple underlying asset, (iv) occupation time derivatives and (v) payoff functions that are homogeneous of degree ν = 1. One important element in the proofs is the ability to renormalize a vector of prices and parameters which determine the payoff of the contract. Homogeneity of degree ν is sufficient in that regard but it is not a necessary condition. Another important element in the proofs is the separation between the role of informational variables and the change of measure (numeraire). Indeed while the change of measure converts the underlying assets into normalized or symmetric assets in the auxiliary financial market the information sets in the two markets are kept the same. This separation enables us to derive symmetry properties even for financial markets in which prices do not follow Markov processes. In the context of diffusion models the change of measure is instrumental for obtaining symmetry properties of option prices without restricting volatility coefficients. Some of the results in the paper can be readily extended. Symmetry-like properties hold for multiasset contracts even when the payoff functions are not homogeneous of some degree ν (for instance when homogeneity of different degrees holds relative to different subsets of the underlying asset prices). In this instance normalized prices in the auxiliary economy involve further adjustments to dividends and volatilities. Likewise the methodology reviewed in this paper also applies, in principle, to complete financial markets with general semimartingales or even to incomplete markets provided that the securities under consideration lie in the asset span.

3. American Options: Symmetry Properties

103

References Akahori, J. (1995), Some formulae for a new type of path-dependent option Annals of Applied Probability 5, 383–8. Bensoussan, A. (1984), On the theory of option pricing Acta Applicandae Mathematicae 2, 139–58. Bjerksund, P. and Stensland, G. (1993), American exchange options and a put–call transformation: a note Journal of Business, Finance and Accounting 20, 761–4. Black, F. and Scholes, M. (1973), The pricing of options and corporate liabilities Journal of Political Economy 81, 637–54. Broadie, M. and Detemple, J.B. (1995), American capped call options on dividend-paying assets Review of Financial Studies 8, 161–91. Broadie, M. and Detemple, J.B. (1997), The valuation of American options on multiple assets Mathematical Finance 7, 241–85. Carr, P. and Chesney, M. (1996), American put call symmetry. Working paper. Carr, P., Jarrow, R. and Myneni, R. (1992), Alternative characterizations of American put options Mathematical Finance 2, 87–106. Chesney, M. and Gibson, R. (1993), State space symmetry and two factor option pricing models, in J. Janssen and C. H. Skiadas, eds, Applied Stochastic Models and Data Analysis. World Scientific Publishing Co, Singapore. Chesney, M., Jeanblanc-Picqu´e, M. and Yor, M. (1997), Brownian excursions and Parisian barrier options Advances in Applied Probability 29, 165–84. Dassios, A. (1995), The distribution of the quantile of a Brownian motion with drift and the pricing of related path-dependent options Annals of Applied Probability 5, 389–98. Detemple, J. B., Feng, S. and Tian W., (2000), The valuation of American options on the minimum of dividend-paying assets. Working paper, Boston University. Gao, B., Huang, J.Z. and Subrahmanyam, M. (2000), The valuation of American barrier options using the decomposition technique Journal of Economic Dynamics and Control, to appear. Garman, M., (1989), Recollection in Tranquility Risk 24, 1783–827. Geman, E., El Karoui, N. and Rochet, J.C. (1995), Changes of numeraire, changes of probability measure and option pricing Journal of Applied Probability 32, 443–58. Girsanov, I.V., (1960), On transforming a certain class of stochastic processes by absolutely continuous substitution of measures Theory of Probability and Its Applications 5, 285–301. Goldman, B., Sosin, H. and Gatto, M. (1979), Path-dependent options: buy at the low, sell at the high Journal of Finance 34, 1111–27. Grabbe, O., (1983), The pricing of call and put options on foreign exchange Journal of International Money and Finance 2, 239–53. Hugonnier, J. (1998), The Feynman–Kac formula and pricing occupation time derivatives. Working paper, ESSEC. Jacka, S. D. (1991), Optimal stopping and the American put Mathematical Finance 1, 1–14. Karatzas, I. (1988), On the pricing of American options Appl. Math. Optim. 17, 37–60. Karatzas, I. and Shreve, S. Brownian Motion and Stochastic Calculus. Springer-Verlag, New York, 1988. Kholodnyi, V.A. and Price, J.F. Foreign Exchange Option Symmetry. World Scientific Publishing Co., New Jersey, 1998. Kim, I.J. (1990), The analytic valuation of American options Review of Financial Studies 3, 547–72.

104

J. Detemple

Linetsky, V. (1999), Step options Mathematical Finance 9, 55–96. Margrabe, W. (1978), The value of an option to exchange one asset for another Journal of Finance 33, 177–86. McDonald, R. and Schroder, M. (1990), A parity result for American options Journal of Computational Finance. Working paper, Northwestern University. McKean, H.P. (1965), A free boundary problem for the heat equation arising from a problem in mathematical economics Industrial Management Review 6, 32–9. Merton, R.C. (1973), Theory of rational option pricing Bell Journal of Economics and Management Science 4, 141–83. Miura, R. (1992), A note on look-back option based on order statistics Hitosubashi Journal of Commerce and Management 27, 15–28. Rubinstein, M. (1991), One for another Risk. Schroder, M. (1999), Changes of numeraire for pricing futures, forwards and options Review of Financial Studies 12, 1143–63.

4 Purely Discontinuous Asset Price Processes Dilip B. Madan

1 Introduction Prices of assets determined in highly liquid financial markets are generally viewed as continuous functions of time. This is true of the Black–Scholes (1973), and Merton (1973) model of geometric Brownian motion for the dynamics of the price of a stock, and of its many successors that include the stochastic volatility models of Hull and White (1987), Heston (1993) and the more recent advances into modeling the evolution of the local volatility surface by Derman and Kani (1994), and Dupire (1994). Jumps or discontinuities, when considered, have been added on as an additional orthogonal compound Poisson process also impacting the stock, as for example in Press (1967), Merton (1976), Cox and Ross (1976), Naik and Lee (1990), Bates (1996), and Bakshi and Chen (1997). This class of models is broadly referred to as jump-diffusion models and as the name suggests they are mixture models studying the high activity and low activity events by using two orthogonal modeling strategies. The purpose of this chapter is to present the case for an alternative approach that stands in sharp contrast to the above mentioned models and synthesizes the study of high and low activity price movements using a class of purely discontinuous price processes. The contrast with the above class of models is that the processes advocated here have no continuous component, as all jump-diffusions must have, and furthermore, the discontinuities are infinite in number with moves of larger sizes coming at a slower rate than moves of smaller sizes. Additionally the jumpdiffusion models have what is called infinite variation, in that the sum of absolute price moves is infinity in any interval and one must square these moves before their sum is finite (the property of finite quadratic variation) while the processes we advocate are of finite variation. Unlike jump-diffusions, our processes model price up ticks and down ticks separately and the price process can be decomposed as the difference of two increasing processes representing the increases and decreases of 105

106

D. B. Madan

prices. We shall also demonstrate that the finite variation property of the proposed models also enhances their robustness and thereby their relevance for economic modeling. This chapter summarizes the findings of research that I have conducted over the past 15 years in collaboration with a number of coauthors. The research is still on going with a number of new and interesting developments already in place, but we shall focus attention on what has been learned to date. The papers that are summarized here are Madan and Seneta (1990) , Madan and Milne (1991), Madan, Carr and Chang (1998), Carr and Madan (1998), (1998), Geman, Madan and Yor (2000), Bakshi and Madan (1998a,b).1 The case for purely discontinuous price processes is, as it should be, an argument with many facets. First we summarize the empirical findings on the study of both the statistical and risk neutral processes and observe the empirical need to consider discontinuous processes as relevant candidates. Statistical reality by itself, however, is not a convincing argument. Unsupported by a theoretical understanding of market fundamentals, statistical modeling is at best a spurious coincidence. One must consider the implications of a fundamental economic analysis. We show that economic analysis with the help of some deep structural mathematical results points in the same direction: the use of purely discontinuous price processes. Statistical reality and theoretical conviction are ultimately no match for success. If the wrong model is brilliantly successful in delivering results, while the right one is relatively barren then we have little choice but to work with the incorrect model, bearing in mind its limitations. To address this concern we present some of the successes of modeling with a purely discontinuous price process. We match the success of Brownian motion in option pricing and portfolio management with the success of the purely discontinuous VG process obtained on time changing Brownian motion by a gamma process. The improvement in option pricing is clear, eliminating the implied volatility smile in the strike direction, and we are able to go further in portfolio management and study the optimal management of portfolios of derivative securities, a question that is relatively untouched in the diffusion context. In fact we successfully calibrate observed derivative portfolios as optimal and employ revealed preference methods to infer what we call the position measure but is better known as the personalized state price density. The perspective of purely discontinuous price processes, we conclude, is not only correct from a statistical and theoretical viewpoint, but is also rich in results and interesting applications. The statistical findings we summarize confirm from a variety of perspectives that the local motion of the stock price is not Gaussian. This is true of both 1 The last of these papers is a working paper and can be obtained from my web site: www.dilip-madan.com.

4. Purely Discontinuous Asset Price Processes

107

the time series of moves and the pricing distribution of moves as reflected in option prices. Apart from these standard tests of normality we also consider the behavior of extremal events. Relying on asymptotic laws of maxima and minima of independent sampled observations (see Embrechts, Kluppelberg and Mikosch (1997)), we employ long time series of returns and reject the hypothesis that asset return distributions are locally Gaussian. They lie in the domain of attraction of the Fr´echet distribution that includes the log gamma formulation of the VG process. Additionally we investigate empirically the relationship between arrival rates of jumps of different sizes with the jump size. The focus of our attention is on whether arrival rates display a monotonicity with respect to size, decreasing as the size rises, and whether the assumption of an infinite arrival rate is supported by a casual analysis of arrival rates. We conclude in favor of infinite and decreasing arrival rates. From a theoretical perspective, we concentrate on the implications of no arbitrage, a property that is fundamental to all models for the asset price process. This property is shown to imply that asset prices in continuous time must be modeled by a time changed Brownian motion. The question at issue is then the nature of the time change. We investigate whether the time change could be continuous, with the resultant implication of the continuity of the price process, and show that this is possible only in economies where returns are locally Gaussian and time is locally deterministic and non-random. Given the overwhelming evidence on the lack of a locally Gaussian return distribution we are led to entertain the lack of continuity of the price process. This modeling choice is also consistent with observations on studying the relationship between time changes and economic activity, whereby we learn that time changes are related to some measure of the rate of arrival of orders or trades. As the latter have a random element, and are not locally deterministic, this suggests that such properties are inherited by the time change and hence once again we are led to the class of discontinuous price processes. Within the class of discontinuous processes we begin our search by focusing attention in the first instance on processes with identical and independently distributed increments: a property shared with Brownian motion, the base model for the underlying uncertainty in the continuous case. This leads naturally via the L´evy–Khintchine theorem for such processes to considering L´evy processes characterized by their L´evy densities whose empirical counterparts are precisely the relationship between arrival rates of jumps of different sizes and the jump size noted earlier in our empirical analysis. When the L´evy density integrates the absolute value of the jump size in the neighborhood of zero, a case we restrict attention to, the process has finite variation and can be decomposed into the difference of two increasing processes that constitute our models for the price up and down ticks. We suggest this model as a partial equilibrium model that clears market buy orders with

108

D. B. Madan

an up tick price response as the order is cleared through the limit sell book. The converse being the case for market sell orders cleared through the limit buy book at a price down tick. An alternative and interesting economic model for price responses goes back to traditional dynamic models of price adjustment that represent the rate of adjustment as a function of the level of excess demand in the economy. We term this function relating the rate of change of prices to excess demand, the force function of the economy. Modeling excess demand by Brownian motion we may write the price process as the difference between price increases occuring during positive excursions of Brownian motion less the cumulated decreases that occur on negative excursions of Brownian motion. Such a price process is of course open to arbitrage by trades that reverse themselves during a single excursion of Brownian motion. For example, on a single positive excursion, one buys at a price and then sells at a higher price in the same excursion. To avoid such arbitrage, we restrict equilibrium trading to equilibrium times by requiring these to occur at the zero set of Brownian motion. This is organized by evaluating the disequilibrium price process at the inverse local time of Brownian motion. The resulting price process inherits the property of being purely discontinuous from inverse local time, and the process is the difference of two increasing processes that cumulate price responses during positive and negative excursions. The two models of discontinuous price processes, (i) L´evy processes and (ii) integrals of force functionals of Brownian motion to inverse local time, are surprisingly related under the hypothesis of complete monotonicity of the L´evy density.2 Every force function has associated with it a completely monotone L´evy density and for every completely monotone L´evy density there exists an equivalent representation of the price process using a force function. The equivalence is however a consequence of some deep results from number theory and hence the surprise. We also consider the issue of robustness of the economic model with respect to tolerance of a heterogeneity of views on parameters and observe that the property of bounded variation in the price process is critical for delivering such robustness. Our concern in robustness with respect to views on parameters is that different beliefs should naturally allow for different probabilities, but the probabilities should remain equivalent and not become singular. With infinite variation there are many cases where a change in certain parameters induces singularity of measures. With the theoretical and statistical foundations in sufficient harmony, and two broad classes of models outlined in sufficient detail, we turn our attention to the 2 The L´evy density is completely monotone if each of its two halves on the positive and negative side have

the property of sign alternating derivatives or equivalently can be expressed as Laplace transforms of positive functions on the positive half line. Hence, they are essentially mixtures of exponential densities.

4. Purely Discontinuous Asset Price Processes

109

study of particularly rich examples in this class of models. The basic generalization of geometric Brownian motion we introduce is the VG process that introduces two additional parameters providing control over skewness and kurtosis. The model arises on evaluating Brownian motion with drift at a random time given by a gamma process. The volatility of the gamma process provides control over kurtosis while the drift in the Brownian motion before the time change controls skewness. We show that this model is successful in option pricing, eliminating the smile in the strike direction with relative ease. Fundamental to the world of purely discontinuous price processes is the property of options being market completing assets with a genuine role to play in the economy and a natural demand for these assets by investors. Recognizing these properties, we reconsider the problem of optimal derivative investment in continuous time, keeping in place Mertonian (1971) objective functions for the investor but expanding the asset space to include all European options on the underlying stock for all strikes and maturities. We find that for HARA utilities and VG statistical and risk neutral measures the derivative investment problem may be solved in closed form and leads in such economies to a healthy demand for at-the-money short maturity options: precisely the options with the greatest liquidity in financial markets. One may view the Black–Scholes economy as teaching us about stock delta positions in option hedging, while the first lessons of investment in purely discontinuous high activity price processes are about positioning in short maturity at-the-money options. With some courage we consider replicating actual trader derivative positions as optimal ones, allowing in the process adjustments in the level of risk aversion in power utility and a view on subjective kurtosis that may differ from the statistically observed kurtosis level. Kurtosis is particularly hard to estimate as its variance is of the order of the eighth moment. With this two dimensional flexibility, we are amazingly successful in many instances in calibrating actual spot slides as optimal wealth responses from the perspective of our continuous time optimal derivative investment model.3 Having inferred risk aversion and the characteristics of subjective probability consistent with replicating observed positions as optimal, we may construct the personalized state price density that values options at a dollar amount yielding a marginal utility that matches the future expected marginal utility from holding the option. We call this state price density the position measure and provide explicit constructions of position measures, contrasting them with the risk neutral and statistical measures. We find generally that position measures are closer to the statistical measure and lie between the statistical and risk neutral measure. This is consistent with the view that traders are aware of relative frequency of 3 The spot slide of a derivatives book graphs the value of the book as a function of the level of the underlying,

typically varying the underlying in the range plus or minus 30% of spot for equity assets.

110

D. B. Madan

occurence of market moves and their prices and accordingly make markets in option contracts. The outline for the rest of the chapter is as follows. Section 2 presents a summary of the statistical results. The economic consequences of no arbitrage are described in section 3, while the two equivalent but apparently different economic models of the price process are summarized in section 4. The task of constructing specific examples consistent with the statistical and economic observations of these sections is taken up in section 5. The basic operating model of the VG process is introduced in section 6. Its successes in option pricing are summarized in section 7. Optimal solutions to the asset allocation problem with derivatives are presented in section 8 and employed to infer position measures in section 9. Section 10 concludes.

2 Properties of the price process This section summarizes some of the broad properties of the statistical and risk neutral price process. We address issues related to the normality of the motion, the behavior of extreme moves and the shape of the density of arrival rates of price moves. The emphasis in all cases is on the movement over short horizons as we view the macro moves as cumulated short moves.

2.1 Long-tailedness of historical returns We begin by considering some well known results about the long-tailedness of the statistical return distribution and standard chi-square goodness of fit tests of normality of the return distribution. Early results on these issues go back to Fama (1965) where both the independence of daily returns and their long-tailedness is documented. We now have data at much higher frequencies of observation and report in Table 1 results on S&P 500 futures returns at these frequencies. We focus attention on the level of the observed kurtosis and on χ 2 goodness of fit tests for normality. We observe from Table 1 that the kurtosis is substantially higher than three, the kurtosis level of a normal distribution. The goodness of fit tests also overwhelmingly reject the hypothesis of normality for returns over short durations. We will note later, in the next section, that this has very significant implications for modeling the dynamics of the price process.

2.2 Long-tailedness in risk neutral distribution Apart from the statistical return distribution we are also interested in the risk neutral or pricing distribution as implied by option prices. This distribution assesses the

4. Purely Discontinuous Asset Price Processes

111

Table 1. High frequency tests of normality S&P 500 Futures Returns Nov. 1992–Feb. 1993.

Kurtosis χ 2 test statistic χ 2 critical value 5%

1 Min.

15 Min.

Hourly

Daily

58.59 437.12 9.26

13.85 931.85 5.7

5.97 98.323 3.57

10.31 123.84 0.989

Source: Dissertation of Thierry An´e, University of Paris IX Dauphine and ESSEC 1997.

futures price of a binary derivative that pays a dollar at a future date if the stock price is in a certain interval, as opposed to the likelihood of the occurence of this event. The distribution may be recovered from observed option prices with the density being given by the second derivative of the European call option price, of maturity matching the future date, with respect to the option strike as derived in Ross (1976a) and Breeden and Litzenberger (1978). If the distribution describing the current prices of derivatives written on future stock price events is Gaussian then an implication is that the implied volatility obtained from equating the option price to the value given by the Black–Scholes formula, should be constant as one varies the strike for a fixed maturity. On the other hand, if this density is symmetric about a point, then the implied volatilities, though no longer necessarily flat with respect to strike, should be symmetric about a point as well. Both these implications are contradicted by what has come to be known as the implied volatility smile. We present in Table 2 below, the implied volatility smile on S&P 500 index options, based on out of the money options using only puts for strikes below, and calls for strikes above, the spot price. These are the more liquid option markets. The time period covered is June 1988 to May 1991 and we focus attention just on the short maturity options. The choice of this focus is motivated by our intention of studying the dynamics of the stock price process, which is but the cumulation of short maturity moves. We observe from Table 2, reading up the columns, that as the strike level rises, the implied volatility falls sharply followed by a smaller rise as one crosses the level of the spot price. We therefore clearly have a smile shape in the short maturity implied volatility, but the left and right sides are not symmetric. We may conclude from these observations that the left tail of the pricing distribution is fatter than the right tail, and this reflects a negative skewness in the distribution. The existence of the smile itself is evidence of excess kurtosis (relative to the normal distribution) in this density.

112

D. B. Madan

Table 2. The smile in implied volatilities at shorter maturities below 60 days. Moneyness spot/strike

June 1988– May 1989

June 1989– May 1990

June 1990– May 1991

1.06

17.27 16.21 16.33 17.42 19.04 21.84

16.16 15.10 15.83 17.81 20.65 25.70

19.70 18.23 18.65 20.87 22.27 25.57

Source: Bakshi, Cao and Chen, Journal of Finance (1997), page 2015.

2.3 The behavior of extreme moves Tables 1 and 2 are classical results on the statistical properties of densities associated with price movements in financial markets. They summarize essentially the narrow behavior of the return distribution as may be evidenced by noting that most of the returns considered in the time series analysis are the ones with the smaller magnitudes, and the range of moneyness reported in the implied volatility curves is just within six percentage points over an average period of a month. Hence the evidence presented is that of lack of normality in the neighborhood of the zero return and one might wonder whether at least the tail of the distributions is Gaussian. For the risk neutral distribution this has the implication that the implied volatility curve flattens out as one gets into deep out-of-the-money options on both sides, though the level at which the curves flatten out may be different on each side. To focus attention on the behavior of the tails of the distribution with a view to addressing whether this may be Gaussian, we consider the behavior of extremes. It is shown in Embrechts, Kluppelberg and Mikosch (1997) that the asymptotic distribution of the maximum and minimum of independent drawings from a Gaussian distribution is given up to shift and scale by the Gumbel distribution. The other possible asymptotic distributions for these extremal events are, again up to shift and scaling, the Weibull and Fr´echet distributions. For distributions that have as support the positive half line, the candidate limiting distributions are just the Gumbel and Fr´echet distributions. The analysis of extreme events requires long time series of data and for this purpose we obtained data on daily returns on the Dow–Jones industrial average (DJIA) for 100 years from 1897–1997. Partitioning this data into non-overlapping intervals of 100 days, we constructed a series on the maximum percentage daily rise and the maximum percentage daily drop in the DJIA over the 100 days. We

4. Purely Discontinuous Asset Price Processes

113

Table 3. Log-likelihoods of the distribution of extremal price movements maximum daily percentage rise and fall in the DJIA over 100 day nonoverlapping intervals for 100 years. Maximum daily drop 100 days Gumbel Fr´echet P-value 1897–1997 1897–1945 1946–1997

768.37 380.22 409.93

808.58 389.98 434.74

0.00 0.01 0.00

Maximum daily rise 100 days Gumbel Fr´echet P-value 1897–1997 1897–1945 1946–1997

811.66 395.79 358.33

833.77 408.92 432.95

0.01 0.01 0.01

Source: Bakshi and Madan (1998), What is the probability of a stock market crash, working paper, University of Maryland.

then artificially nested the Gumbel and Fr´echet log likelihoods and tested the null hypothesis that the distribution of the extreme event is Gumbel, the limit of the Gaussian tail. Table 3 presents these results. Table 3 demonstrates that the normality hypothesis may also be rejected as a model for the tails of the statistical distribution of daily returns. Given the evidence on excess kurtosis, we would conjecture that these tails are heavier than Gaussian and if the property is shared with the risk neutral distribution, as we suspect it is, then implied volatilities must continue to rise as we get deeper out-of-the-money, i.e., the implied volatility curves do not flatten out at either end of the strike range. At this point we do not have documentary evidence on very deep out-of-the-money implied volatilities but observations from current market quotes on S&P 500 index options would suggest that this may well be the case.

2.4 The structure of the arrival rates of price moves The arguments of this chapter lead us to considering as models for the dynamics of stock prices, purely discontinuous processes. Such processes, when they have independent and identically distributed increments, are characterized by their L´evy densities that essentially count the rate of arrival of jumps of different sizes. These are a wide class of processes, and structural properties if supported by data are beneficial in limiting the class of models that need to be considered. One such structural property is complete monotonicity of the L´evy density, whereby large

114

D. B. Madan

jumps occur at a smaller rate than small jumps. This is a reasonable property to expect as market participants facing price increases on buy orders and decreases on sell orders have an incentive to minimize these impacts. Another structural property is the aggregate arrival rate of jumps or moves, that could be finite or infinite. We note in this regard that Brownian motion is an infinite activity process as the actual sum of absolute price moves is itself infinite for Brownian motion as it is a process of infinite variation. We note further that jump-diffusions employ a compound-Poisson process for the arrival of jumps that have a finite arrival rate with the magnitude of jumps having, once again, a normal distribution. The models we propose in this chapter have infinite arrival rates of jumps and in this regard they are closer to Brownian motion, but unlike Brownian motion they are processes of finite variation. This requires that the integral of the L´evy density be infinite, but the density times the jump size should have a finite integral near zero. A typical L´evy density meeting these conditions is of the form α exp(−β |x|)/ |x|1+ρ for jump size x with ρ > 0. The log arrival rate is in this case linear in the jump size and the log of the jump size, with the coefficient on the log of the jump size being above unity. For ρ > 1 we have infinite variation and ρ = 0 is the case of the gamma process, or in this case the difference of two gamma processes which we will note later is the VG model. On the other hand if the jump sizes are exponentially distributed with a finite arrival rate, as postulated for example in Das and Foresi (1996) then the log arrival rates are linear in just the size with the coefficient on log size being 0 or ρ = −1. In contrast the log arrival rate of the compound-Poisson process with Gaussian jump sizes (see Cox and Ross (1976)) is linear in the size and the square of the size. Since the exponential of a negative quadratic shifts from being concave near zero to convex near infinity, such a L´evy density is not completely monotone. A cursory evaluation of these structural properties may be simply made by regressing log arrival rates on the size of jumps, their log and their square. For our 100 year data on daily returns on the DJIA we counted the number of arrivals of jumps in the different size categories and then regressed the log of the empirically observed arrival rate on the size of the jump, its log and its square. For the Cox and Ross (1976) model the log arrival rates have a single representation that is not distinguished by the sign of the jump, while for the Das and Foresi and VG type models, the parameters vary with sign, so the latter two model estimates allow for this by separating out the positive and negative moves. Table 4 presents the results of these regressions. From Table 4 we observe that the coefficient of log size in the first two regressions is significantly different from zero and may even be close to two, which definitely argues against a process with a finite arrival rate, as in Das and Foresi (1996). As in a number of cases the coefficient is estimated above two, the process

4. Purely Discontinuous Asset Price Processes

115

Table 4. Regression of log arrival rates on the sizes of jumps. Standard errors are in parentheses. Log arrival rates of drops Constant Jump size Log size

R2

1897–1997

−9.88

−31.6

−1.92

0.97

1897–1945

−8.51

−33.0

−1.65

0.97

1946–1997

−12.35

−32.0

−2.41

0.95

(1.44) (1.45) (2.22)

(8.36) (8.53)

(17.78)

(0.32) (0.32) (0.45)

Log arrival rates of rises Constant Jump size Log size

R2

1897–1997

−11.55

−24.5

−2.25

0.96

1897–1945

−10.29

−25.4

−1.99

0.97

1946–1997

−13.66

−25.8

−2.67

0.93

(1.71) (1.65) (3.23)

(9.10) (8.97)

(24.45)

(0.38) (0.37) (0.65)

Arrival rates for jump diffusion Constant Jump size Size2

R2

1897–1997

−3.66

−1.73

−447

0.70

1897–1945

−3.36

−1.77

−421

0.71

1946–1997

−3.17

1.54

−928

0.64

(0.53) (0.48) (0.65)

(3.86) (3.66) (8.98)

(66) (62)

(191)

Source: Bakshi and Madan (1998), What is the probability of a stock market crash, working paper, University of Maryland.

may be one of infinite variation. However, we cannot reject the hypothesis that this coefficient is below two and hence we may have a process of finite variation. As will be argued later, there are other reasons for entertaining a finite variation process and in the absence of strong evidence to the contrary we conclude in favor of finite variation processes with infinite arrival rates. Regarding the comparison with the Cox and Ross (1976) process with quadratic log arrival rates, we note that the linear term is in all cases insignificant, suggesting a pure quadratic model, but note further that one explains only up to 70% of the variation in arrival rates compared with up to 97% of the variation using the completely monotone density.

116

D. B. Madan

2.5 Summary of empirical observations We note from Tables 1 and 2 that both the statistical and risk neutral distributions are for short intervals, not normal distributions. They have significant levels of excess kurtosis and the risk neutral distribution in particular is also skewed to the left with a heavier left tail than a right tail. This absence of normality continues into the tail of the densities as reflected by an analysis of extremes in Table 3. From Table 4 we infer that a reasonable model could be a pure jump model with an infinite arrival rate – L´evy density integrating to infinity – and a process of finite variation. We also infer from Table 4 some support for a completely monotone L´evy density. Heavy risk neutral tails, if confirmed, imply that implied volatilities are strictly U -shaped and do not flatten out as one moves deep out of the money in both directions.

3 The implications of economic theory One of the most far reaching implications of economic theory are now recognized to be the consequences of the no arbitrage hypothesis. From early beginnings with the Ross’ (1976) theory of arbitrage, and its application to option pricing by Black and Scholes (1973) and Merton (1973) to the development of the martingale theory of pricing by Harrison and Kreps (1979) and Harrison and Pliska (1981) this hypothesis has yielded many deep and interesting results. We demonstrate in this section a continuation of these lessons and draw out more exactly the implications of this hypothesis for modeling the dynamics of the asset price. Before proceeding we note an important proviso with regard to this hypothesis. Financial markets may display arbitrage opportunities and there are many documented “so-called” anomalies that are suggestive of such a possibility, yet it remains true that models of the price process to be employed in developing derivative pricing models must be free of arbitrage. This is so for the simple reason of preventing traders from arbitraging a firm quoting arbitrageable prices. That models must be arbitrage free goes without question.

3.1 The stochastic process implications of no arbitrage Four results, one from mathematical finance and the other three from the theory of stochastic processes, form the foundations for the stochastic process implications of the hypothesis of no arbitrage. The first of these results, from mathematical finance, demonstrates that the absence of arbitrage is equivalent to the existence of an equivalent martingale measure. The other results, from the theory of stochastic processes, characterize martingales.

4. Purely Discontinuous Asset Price Processes

117

3.1.1 No arbitrage and martingales This result has many proofs or no proof depending on the context and meaning to be attached to the idea of no arbitrage. In discrete time and with finitely many states there is no ambiguity and the result is true with a proof going back to Harrison and Kreps (1979). At the other extreme we have continuous time and states given, at a minimum, by the relatively large set consisting of the paths of the stock price process. Here the existence of martingale measures easily implies the absence of arbitrage, but the implication in the reverse direction is not available, and this is the direction that concerns us here. Essentially the hypothesis of no arbitrage, merely asserting that one cannot combine a portfolio of existing assets to earn a non-negative, non-zero, cash flow at a negative current price is too weak to deduce the existence of a martingale measure. For interesting counterexamples of economies satisfying no arbitrage and yet not satisfying the existence of a martingale measure the reader is referred to Jarrow and Madan (1998). In these richer contexts allowing an infinity of dynamic trading strategies, the hypothesis of no arbitrage must be strengthened to permit deduction of a martingale measure. The strengthening required is topological in nature and requires that one not be able to construct an approximation to an arbitrage opportunity in some limiting sense, and then it does follow that there exists an equivalent martingale measure. The first results in this direction are due to Kreps (1981). The difficulty with the result of Kreps (1981) is the weak sense in which the limit is taken, as the definition of approximation lacks a sense of uniformity, and what is regarded as an approximation may not be so from the perspective of other economic agents. The strongest results in this direction are due to Delbaen and Schachermayer (1994). They employ a strong and uniform sense of no arbitrage and show that if there is no random sequence of zero cost trading strategies converging in this strong sense to a non-negative, non-zero cash flow, with the random sequence being uniformly bounded below by a negative constant, then there exists a martingale measure and the converse holds as well. They term this hypothesis No Free Lunch with Vanishing Risk (NFLVR) and prove that it is equivalent to the existence of an equivalent martingale measure. 3.1.2 Martingales and semimartingales The second important result in ascertaining the stochastic process implications of the hypothesis of no arbitrage is Girsanov’s theorem. This is pointed out by Delbaen and Schachermayer (1994) and amounts to noting that if there exists a change of measure from the true statistical measure P to a martingale measure or risk neutral measure Q such that under Q discounted asset prices are martingales, then it must be that under P the price process was a semimartingale to begin with.

118

D. B. Madan

This is a very useful realization as it informs us that models for price processes may safely be restricted to the class of semimartingale processes. Since the class of semimartingales is very wide indeed, one might argue that this is not a very important insight. On the other hand, a lot is known about the structure of semimartingales and for a modeler it is useful to know that the search may be constrained by this structure. Some recent examples of proposals for stock price processes that are not semimartingales include the use of fractional Brownian motion with the arbitrage demonstrated in Rogers (1997). Semimartingales are a difficult concept to communicate in precision, as they go beyond the idea of a simple concept and are in fact a fairly complete and very general theory of random processes, yet given their established importance to the field of mathematical finance today, it is imperative that we communicate some of the flavor of this theory, and do so with brevity. There are at least two approaches, one analytical and the other structural and it is best to consider the structural approach. From this perspective a semimartingale is described by its decomposition into a martingale plus a very general model for the drift of the process. This certainly includes linear drift but also more general models of the drift. One merely requires that this process be of finite and integrable variation, as well as being predictable (i.e. the limit of left continuous functions). Examples include Brownian motion with drift, solutions to stochastic differential equations like the mean reverting Cox, Ingersoll and Ross (1985) interest rate process and the VG model (Madan, Carr and Chang (1998)) with drift to be discussed later in the chapter. To appreciate what is not a semimartingale, we consider the discrete time continuous state context studied by Jacod and Shiryaev (1998) where they show that the no arbitrage property is lost if zero is not in the relative interior of the support of the multivariate return distribution over the discrete time step and hence the arbitrage. We also learn from this paper that not all semimartingales are stock price models, as calendar time is a semimartingale with a zero martingale component and has arbitrage if it was a price process. The important property is to get zero into the relative interior of the support, at least in discrete time. Price processes must be semimartingales with a non-zero martingale component. 3.1.3 Semimartingales and time changed Brownian motion The next result we employ in developing our understanding of the stochastic process implications of no arbitrage is a fundamental characterization of all semimartingales, due to Monroe (1978). This remarkable result shows that every semimartingale can be written as a Brownian motion (possibly defined on some adequately extended probability space) evaluated at a random time. This result is somewhat surprising at first, since Brownian motion, even if evaluated at a random time, is suggestive of a martingale and as noted earlier semimartingales include

4. Purely Discontinuous Asset Price Processes

119

simple linear drifts like time itself. However, this is only a problem at first glance as the time change need not be independent of the Brownian motion and calendar time t, for example, is Brownian motion W (t) evaluated at the first time T (t) at which this same Brownian motion reaches t. By this result the study of price processes is reduced to the study of time changes for Brownian motion and one may consider both independent and dependent time changes. One might ask what the time change represents? Ignoring price changes that are the possible result of noise or liquidity trades, changes in the price of an asset occur through trades motivated primarily for reasons of information. The cumulated arrival of relevant information is a reasonable, economically meaningful measure of the time change, that gets translated into buy or sell orders. Geman, Madan and Yor (2000) consider many models for the process of buy and sell orders and relate the time change in all these cases to some measure of economic activity. In some cases the measure is just the number of trades while in other cases time is measured by the weighted sum of order arrivals, where the weights vary with the size of the order. When time is viewed in this economically fundamental manner the question of dependence or independence of the time change becomes an interesting and meaningful question. Certainly, some part of the order process and hence the time change, one would expect, is motivated by observations of the price process. This is the phenomenon of herding or runs on the asset. On the other hand if the market is dominated by independent analysts who view the market price as always providing us with the most efficient and accurate valuation of the asset, i.e. it is a discounted martingale under the right measure, then there is no information to be extracted from prices that the market has not already extracted and so no analysts are motivated in their trades by observations of price movements. They are bound to seek independent, and as far as possible, private information, as the motivating basis of their trading decisions. This interpretation of the process suggests an independent time change. We also note that from a mathematical modeling viewpoint, it would be easier to work with independent time changes though it is possible and we shall see cases where both representations are possible for the same process. Generally, the independent time change is the more tractable alternative and so far most of our successes come from processes of this type. The broad consistency of this hypothesis with the efficient markets hypothesis is therefore an attractive feature. 3.1.4 Continuous time changes and semimartingales We come now to the crux of the issue, the continuity of the price process or otherwise. This brings us to the third and final result from the theory of stochastic processes shedding light on the nature of the price process as a consequence of no arbitrage. We note first that as the price process is a time changed Brownian

120

D. B. Madan

motion, it will be a continuous process essentially only if the time change is continuous. The implications of supposing such continuity in the time change rely on results characterizing continuous semimartingales (Revuz and Yor (1994), page 190). Let X (t) be a continuous semimartingale, be it the price process or the time change. Let V (t) be the quadratic characteristic of the semimartingale X (t) which exists by virtue of X being a semimartingale. In the terminology of Wall Street the process V (t) is akin to the realized total variance on the process X (t). If the process X (t) has a well defined sense of a variance rate per unit time, or equivalently V (t) is differentiable in t then the quadratic characteristic is absolutely continuous with respect to Lebesgue measure and in this case we may write the process X (t) as a stochastic integral with respect to Brownian motion. Under these conditions there exist processes a(t), b(t) and a standard Brownian motion W (t) such that t t a(s)ds + b(s)dW (s). (1) X (t) = X (0) + 0

0

Consider now the implications of X (t) being a time change and the price process in turn. If X (t) is a time change, then it is an increasing process and so b(t) must be identically zero. This implies that the time change is locally deterministic with no uncertainty in local rate of time change which is then a(t). If we view the time change, as suggested earlier, as a measure of economic activity, proxied by the rate of arrival of information, orders, or size weighted orders then one would expect some local uncertainty in the time change and this argues against the use of a locally deterministic time change and hence, by implication, a continuous semimartingale as a model for the price process. On the other hand if one views X (t) directly as a price process, the representation (1) argues that the local motion of the stock return must be Gaussian. Given the considerable evidence cited against the likelihood of this possibility, we conclude once again that a continuous semimartingale is not an appropriate model for the price process. Now it is possible that there is a continuous martingale component in the price process in addition to a jump component as is the case of jump diffusions, but the necessity of introducing such a diffusion term onto a functioning purely discontinuous model must be separately argued for. As we will observe, the latter class of models contain many alternatives capable of approximating very closely the structural characteristics of diffusions. 3.1.5 Summary of the consequences of no arbitrage We showed in this section that no arbitrage implies, via the existence of an equivalent martingale measure, that the price process is a semimartingale. We then observed that all semimartingales are time changed Brownian motions, time changed

4. Purely Discontinuous Asset Price Processes

121

by a random increasing time change. The resulting process could be continuous only if the time change is locally deterministic. Relating time changes to measures of economic activity with some local uncertainty we argued that the price process was not a continuous process. We also observed that such continuity implies that the process is locally Gaussian, for which we have ample evidence to the contrary, and so once again we concluded that the process cannot be continuous. The remaining sections will take up the issue of modeling using purely discontinuous processes and demonstrate their effectiveness. The need to add on an additional continuous process onto a functioning purely discontinuous process must in our view be argued for on theoretical and empirical grounds. Carr, Geman, Madan and Yor (2000) present evidence to the contrary.

4 Economic models of finite variation for asset price processes Statistical and economic analysis suggests that we entertain purely discontinuous price processes with possibly infinite arrival rates, and finite variation. An attractive feature of finite variation processes is that they may be decomposed as the difference of two increasing processes, a property lost in Brownian motion and other processes of infinite variation. This permits, for the first time, a separation of the price process into the process of up ticks and down ticks. Our analysis of optimal contracting in such economies indicates that the major demand for short maturity at-the-money options in such economies arises from a desire on the part of investors to be positioned differently with respect to upward and downward movements in the market, a position not attainable by direct stock investment alone. Hence options, and short maturity at-the-money options in particular, play a fundamental role in such economies: a role that may be consistent with casual observations of high activity in these markets. The next step forward from correctly adjusting one’s delta or stock position is the optimal positioning of the up and down deltas via option trades. To effectively answer these questions it is imperative that we focus attention, separately, on the up and down forces of the market. We propose here two classes of models, accomplishing this objective. The models differ in their primitives and are structurally distinct, yet we show in the next section that under some fairly reasonable conditions, they are in fact equivalent. However, tractability is enhanced by working with both specifications as it can be difficult to find the equivalent formulation from the alternate perspective. The first class of models takes as primitives two increasing processes that represent cumulated orders to buy and sell at market and models the price responses as these orders are cleared through the limit sell and buy books respectively. Economic activity and the related concepts of economic time reflect cumulated orders

122

D. B. Madan

of both types in this representation of the price process. We term this class of models the Order Processing Models (OPM). The second class of models is related to traditional models of dynamic price adjustment with price changes expressed as a function of the level of excess demand in the economy. This response function is termed the force function of the economy as it measures price pressure in its relationship with excess demand. The excess demand itself is modeled by a Brownian motion with the equilibrium points given by the zero set of Brownian motion. Economic time in these models is given by cumulated squared price responses or the realized variance. This class of models we refer to as Dynamic Price Adjustment Models (DPA).

4.1 Prices in the order processing model (OPM) The primitives in this view of the price process are two increasing processes that represent cumulated market buy orders, U (t), and cumulated market sell orders V (t). We have noted in our discussion of time changes that increasing random processes with local uncertainty are necessarily purely discontinuous. By taking as primitives such increasing random processes, the fundamental uncertainties of the economy are discontinuous and prices modeled as market responses to such inherit this property. Defining the jumps in the processes U (t) at time t by U (t) = U (t)−U (t ) where we note that the processes are by construction right continuous with left limits and U (t) = lims↓t U (s) while U (t ) = lims↑t U (s) and likewise for V (t), V (t ) and V (t). The property of being increasing and purely discontinuous implies that U (t) =

U (s)

s≤t

V (t) =

V (s)

s≤t

so that the current value of each process is just the sum of all the jumps that have occured to date. Price changes are modeled in Geman, Madan and Yor (2000) by market responses to these market buy orders. Here we describe the process of price increases. The magnitude U (t) is viewed as a buy order at the prevailing price of p(t ) which by construction cannot be accessed. There is a downward sloping demand curve q du ( p(t)/ p(t ), U (t), t) that is U (t) at p(t) = p(t ) and an upward sloping supply curve q su ( p(t)/ p(t ), U (t), t) that is zero at p(t) = p(t ) that must be equated to determine both the quantity transacted q u = q du = q su and

4. Purely Discontinuous Asset Price Processes

123

the price response p(t). The solution gives the price response in log form by p(t) = "u (U (t), t). ln p(t ) A similar analysis yields the price response to a market sell order p(t) = "v (V (t), t). ln p(t ) The price process is obtained as an aggregation of the price responses to market buy and sell orders "u (U (s), s) − "v (V (s), s) ln( p(t)) = ln( p(0)) + s≤t

s≤t

and is by construction the difference of two increasing processes, and therefore a finite variation process. It is also purely discontinuous in that it is precisely the sum of all its jumps. Geman, Madan and Yor (2000) rewrite such processes in many cases as time changed Brownian motion and study the relationship between the time change and the market primitives, showing that the time change is generally a size weighted sum of the market buy and sell order processes. Hence their interpretation as measures of the level of economic activity.

4.2 The dynamic adjustment model (DPA) This formulation of the price process begins with a traditional price adjustment model of the form d ln( p) = f (z(t)) dt where z(t) is a measure of excess demand and f represents the force by which prices respond to excess demand in the economy. This function we term the force function of the economy. By construction f (x) ≥ 0 for x > 0 and f (x) ≤ 0 for x < 0. Excess demand is exogeneously modeled as dominated by new information and is given by a Brownian motion W (t). It follows that t ln( p(t)) = ln( p(0)) + f (W (s))ds. 0

Equilibrium times are of course given by the zero set of Brownian motion and there are arbitrage opportunities to be made during upward or downward rallies by buying or selling and then reversing the trade before the end of the rally. Such intra rally trades are not available to general market participants whose price access is only at equilibrium times. The restriction to equilibrium times, the zero set of

124

D. B. Madan

Brownian motion, is accomplished by evaluating the above process at the inverse local time of Brownian motion at zero, σ (t). We therefore define σ (t) ln( p(t)) = ln( p(0)) + f (W (s))ds. (2) 0

This process is once again a purely discontinuous process, inheriting this property from that of inverse local time. It may be decomposed as the difference of two increasing processes σ (t) σ (t) ln( p(t)/ p(0)) = f + (W (s))ds − f − (W (s))ds 0 +

0

−

where f (x) = f (x)1(x≥0) ; f (x) = f (x)1(x≤0) , and is a process of finite variation under the condition

K −K | f (x)| d x < ∞ for all K . It is interesting to enquire into the nature of the force function in the economy. For example, if f (x) > 0 for all x > 0 and f (x) < 0 for x < 0 then the price process is one with an infinite arrival rate of jumps. On the other hand there are finitely many jumps in any interval if f (x) = 0 in a neighborhood of zero. Another interesting question is whether the force is immediately infinite and decreasing for larger excess demands or whether it rises with the level of excess demand. Geman, Madan and Yor (2000) present many explicit solutions that may be employed to answer such questions. They also show that such a process may be written as Brownian motion evaluated at a time change that aggregates the squared price responses and is thereby a measure of realized variance.

5 Prices as L´evy processes Finite variation asset price processes are by construction the difference of two increasing processes and section 4 has described two classes of economic models that give rise to such processes. We now wish to construct specific examples of such processes that may be evaluated empirically in their adequacy as models for the statistical dynamics of the price process, and as models for the pricing densities reflected in option prices. This statistical evaluation is enhanced if one has effective descriptions of the transition densities for use in maximum likelihood estimation and closed form or otherwise fast and accurate computation methods for the prices of European options when the underlying process is in the described class. Both these objectives are simultaneously met by an analytic closed form for the characteristic function of the log of the stock price at a future date. The density is then easily evaluated by Fourier inversion and maximum likelihood estimation

4. Purely Discontinuous Asset Price Processes

125

is feasible, alternatively one may also follow the methods outlined in Madan and Seneta (1989) and estimate parameters by maximum likelihood on transformed variates. Option prices are easily obtained from the characteristic function and this is described in Bakshi and Madan (1998) and a faster algorithm is provided in Carr and Madan (1998). Carr and Madan show how to analytically write the Fourier transform in log strike of an exponentially damped call price, in terms of the characteristic function of the log stock price. The damped call price and call price are then obtained by a single Fourier inversion that may even invoke the fast Fourier transform. The characteristic function of the log stock price is therefore seen as the key to efficient model validation from both a statistical and risk neutral perspective.

5.1 The characteristic function of log price relatives In constructing alternatives to Brownian motion as models of the fundamental uncertainty driving the stock price, that may meet our requirements of being a purely discontinuous process of finite variation with a possibly infinite arrival rate of shocks, we focus in the first instance on keeping all the properties of Brownian motion except those that must be given up. We are well aware that just as more complex models allowing for stochastic volatility and correlations of various sorts can be constructed out of Brownian motions by combining them in various ways, the same can be done with any candidate process that replaces Brownian motion. The first property of Brownian motion that we seek to keep is the analytically rich property of being a process of independent increments, identically distributed over non-overlapping intervals of equal lengths of time. This introduces a homogeneity of the base uncertainty across time, that may be altered through parametric shifts in later developments. In any case, for modeling the local motion, homogeneity should be a reasonable hypothesis from at least the perspective of a local approximation that employs some average density of moves, even if the actual ones are state contingent and time varying. The second property, which we may or may not keep, is that of finite moments of all orders. We are modeling continuously compounded returns and this should in principle be a bounded random variable, even if it is difficult to organize this within a modeling context, and hence the finiteness of moments is really a non-issue. Considerations of analytical tractability may on occasion require us to consider processes with infinite moments, but my priority is to avoid them as far as possible. The theory of stochastic processes has a lot to teach us about processes meeting these conditions. Such processes are called infinitely divisible and the L´evy– Khintchine theorem (see Feller (1971) and Bertoin (1996)) provides us with a complete characterization of the characteristic function. Specifically, let X (t) =

126

D. B. Madan

log(S(t)) be the continuous time process for the log of the stock price with mean µt, and further suppose that X (t) is a finite variation process of independent identically distributed increments. Then there exists a unique measure ) defined on R − {0} such that ∞ iux de f e − 1 )(d x) . φ X (t) (u) = E exp(iu X (t)) = exp iuµt + t −∞

The measure ) is called the L´evy measure of the process and X (t) is a L´evy process. When the measure has a density k(x), we may write ∞ iux (3) φ X (t) (u) = exp iuµt + t e − 1 k(x)d x −∞

and we refer to the function k(x) as the L´evy density. Heuristically the density k(x) specifies the arrival rate of jumps of size x and the L´evy process X (t) is a compound Poisson process with a finite arrival rate if the integral of the L´evy density is finite. We shall primarily be concerned with L´evy processes with an infinite arrival rate. The L´evy process may always be approximated by a compound Poisson process obtained by truncating the L´evy density in a neighborhood of zero, and using as an arrival rate λ= k(x)d x |x|>ε

and as a density for the jump magnitude conditional on the arrival, the density g(x) =

k(x)1|x|>ε . λ

The convergence occurs as we let ε → 0. Geman, Madan and Yor (2000) present many examples of candidate L´evy processes that are associated with the two economic models OPM and DPA of section 4.

5.2 Robustness of finite variation L´evy processes Continuous time processes with continuous sample paths have a certain lack of robustness best illustrated by considering geometric Brownian motion under two different but close volatilities. Two individuals could perhaps hold such different views on volatility but as a consequence their probability measures are no longer equivalent but are in fact singular. The set of paths receiving probability 1 under one measure has probability 0 under the other measure. The measures are not robust, in the sense of equivalence, to different volatility beliefs. This lack of robustness is really a consequence, not of continuity, but of infinite variation.

4. Purely Discontinuous Asset Price Processes

127

Hence, remaining in the class of finite variation processes enhances robustness of the models to heterogeneity of views on various parameters. To appreciate this point we note (Jacod and Shiryaev (1980), page 159) that when two L´evy processes with L´evy densities k(x) and k (x) are equivalent then there exists a positive measurable function Y (x) such that k (x) = Y (x)k(x) and

∞

−∞

|(|x| ∧ 1) (Y (x) − 1)| k(x)d x < ∞.

One may rewrite (5) on employing (4) as (|x| ∧ 1) k(x) − k (x) d x + (|x| ∧ 1) (k (x) − k(x))d x < ∞ k k

(4)

(5)

(6)

and observe that on the set |x| > 1 the required integrability holds by virtue of the integrability of the L´evy densities on this set. On the set |x| < 1 we have the integrability condition |x| (k(x) − k (x))d x + |x| (k (x) − k(x))d x < ∞ k k

and this condition essentially requires that the difference between the two L´evy measures be a finite variation process and holds automatically if both L´evy processes are of finite variation. Hence for finite variation processes, equivalence just requires absolutely continuity of the measures with respect to each other or the condition (4) with no integrability conditions. Restrictions on the ability to change parameters like volatility in geometric Brownian motion follow from the integrability conditions for equivalence and apply to processes with infinite variation. In this regard one may consider the L´evy measure studied in Geman, Madan and Yor (2000) of the form k(x) =

e−x for x > 0. x 2+α

For α > 0 this process has infinite variation and the parameter generating the infinite variation is α. This parameter cannot be changed if equivalence is to be preserved. Specifically, if k (x) =

e−x x 2+β

for α = β and α, β > 0 the two measures are no longer equivalent and it is the integrability condition (5) that fails.

128

D. B. Madan

5.3 Complete monotonicity (CM) There are of course many L´evy densities that one may employ in modeling the price process. It is therefore useful if the collection of possible choices can be reduced by invoking some structural properties. One such property is that of complete monotonicity. The idea is to require the arrival rates of large jumps to be less than the arrival rates of small jumps. This suggests that k(x) be decreasing in |x| or that k (x) ≤ 0 for x > 0 and k (x) ≥ 0 for x < 0. The first derivative of the L´evy density is therefore of one sign on each side of zero. The property of complete monotonicity requires that all the derivatives, and not just the first, have this property of having the same sign on each side of zero. By a result of Bernstein this property is equivalent to requiring k(x) for x > 0 to be the Laplace transform of a positive measure on the positive half line and similarly for k(x) for x < 0. Specifically we require that there exist measures G p and G n ,

∞

k(x) =

e−ax G p (da) for x > 0

0

k(x) =

∞

eax G n (da) for x < 0.

0

The L´evy density is then a mixture of exponential densities. An important result that follows for such L´evy densities is that the two classes of economic models OPM and DPA are equivalent under the CM property.

5.3.1 Equivalence of OPM and DPA under CM In particular, for every force function defining the price response under DPA, the resulting price process of equation 2 is a L´evy process with a completely monotone L´evy density. Geman, Madan and Yor (2000) give numerous examples of force functions and their associated L´evy densities. For example, if the force function is x m for some integer m > 0 then the process is one of independent stable increments with index α = (1/2 + m)−1 . Conversely, every L´evy process with such a completely monotone L´evy density can be written as the integral of a functional of Brownian motion up to the inverse local time of the Brownian motion. This equivalence result is an application of analytical results from number theory called Krein’s theory and the specification construction of the force function from the L´evy density and vice versa remains a difficult, if not impossible task. Specifically, for the variance gamma model that we introduce next, we know the L´evy density quite explicitly but are not aware of what the force function is in this case.

4. Purely Discontinuous Asset Price Processes

129

6 The variance gamma model Purely discontinuous processes of finite variation with infinite arrival rates contain a particularly tractable and parametrically parsimonious subclass of processes that is constructed from two very well known processes, Brownian motion and the gamma process. This is the “so-called” variance gamma process first studied by Madan and Seneta (1990). The process studied in Madan and Seneta (1990), was the symmetric variance gamma process that is obtained on evaluating Brownian motion at gamma time. An asymmetric risk neutral process was developed by Madan and Milne (1991) by assuming that a Lucas representative agent with power utility had to hold the risk exposure in a symmetric variance gamma process. It was shown in Madan, Carr and Chang (1998) that the resulting risk neutral process was equivalent to evaluating Brownian motion with drift at gamma time. Given the importance of asymmetry or skewness in option pricing, we focus directly on this asymmetric variance gamma process but will refer to it as the variance gamma process. The process is parametrically parsimonious in that only two additional parameters are involved beyond the volatility introduced by Black and Scholes, and these two parameters give us control over skewness and kurtosis, that are precisely the primary concern in modeling and assessing derivative risks.

6.1 The variance gamma process Let Y (t; σ , θ ) be a Brownian motion with drift θ and variance rate σ 2 . If W (t) is a standard Brownian motion, we may write the process Y (t; σ , θ) in terms of W (t) as Y (t; σ , θ) = θt + σ W (t). The variance gamma process is obtained on evaluating the process Y at an independent random time given by a gamma process. For this we define the process G(t; ν) with independent increments, identically distributed over non-overlapping intervals of length h, with the increments, G(t + h; ν) − G(t; ν) = g, having the gamma density p(g, h) =

g h/ν−1 exp(−g/ν) . ν h/ν (h/ν)

The mean of the gamma density is h and the variance is νh. Hence the average random time change in h units of calendar time is h and its variance is proportional to the length of the interval. The gamma density is infinitely divisible with characteristic function h/ν 1 E exp(iug) = 1 − iuν

130

D. B. Madan

and the gamma process is an increasing L´evy process with a one sided L´evy density exp (−x/ν) , for x > 0. νx Both the gamma process and Brownian motion are highly tractable processes about which a lot is known and each process has seen many domains of application. The variance gamma process is the process X (t; σ , ν, θ) defined by k(x) =

X (t; σ , ν, θ ) = Y (G(t; ν); σ , θ) = θ G(t; ν) + σ W (G(t; ν))

(7)

or Brownian motion with drift θ and variance rate σ 2 evaluated at the gamma time G(t; ν). Apart from the variance rate of the Brownian motion σ 2 , the two other parameters are θ and ν. We shall observe that it is θ that generates skewness while kurtosis is primarily controlled by ν. 6.1.1 Characteristic function of the variance gamma process The characteristic function of the variance gamma process is easily evaluated by conditioning on the gamma process first and then employing the characteristic function of the gamma process itself. It has a simple analytic form of a quadratic raised to a negative power. Specifically, $ νt # 1 de f . (8) φ X (t) (u) = E exp (iu X (t)) = 2 1 − iuθν + σ 2ν u 2 The Black–Scholes and Merton model employing Brownian motion is a limiting case of this model since the process converges to Brownian motion with drift as one lets the volatility of the time change ν tend to zero. This may also be observed from the characteristic function on letting t/ν tend to infinity as ν tends to zero and noting that the limit is precisely exp(iuθ t − σ 2 u 2 t/2)t the characteristic function of Brownian motion with drift. We also note that if θ is zero, the characteristic function is real valued and the process is therefore symmetric and there is no skewness, hence validating the claim that skewness is generated by θ = 0. This observation is even clearer once we have constructed the L´evy measure for the VG process. 6.1.2 Moments of the variance gamma process The moments of the VG process are easily obtained by exploiting the structure of the process or by differentiating the characteristic function. It is shown in Madan, Carr and Chang (1998) that E [X (t)] = θ t

4. Purely Discontinuous Asset Price Processes

E (X (t) − E [X (t)])2 = θ 2 ν + σ 2 t E (X (t) − E [X (t)])3 = 2θ 3 ν 2 + 3σ 2 θν t E (X (t) − E [X (t)])4 = 3σ 4 ν + 12σ 2 θ 2 ν 2 + 6θ 4 ν 3 t + 3σ 4 + 6σ 2 θ 2 ν + 3θ 4 ν 2 t 2 .

131

We observe again that skewness is zero if θ = 0. Furthermore, in the case of θ = 0 we have that the fourth central moment divided by the square of the second central moment or the kurtosis is 3(1 + ν). This leads to the interpretation that the parameter ν controls kurtosis and is in fact (for θ = 0) the percentage excess kurtosis over the kurtosis of the normal distribution, which is three. 6.1.3 The variance gamma process as a process of finite variation The variance gamma process is a finite variation process and the two increasing processes whose difference is the variance gamma process are both gamma processes. This is observed by considering two independent gamma processes γ p (t) and γ n (t) with mean rates of µ p , µn and variance rates ν p , ν n respectively for the positive and negative components. The characteristic functions of the two gamma processes are E exp(iuγ k (t)) =

1 1 − iuν k /µk

µ2k t/ν k

for k = p, n.

Supposing that the two gamma processes have the same coefficients of variation and ν k /µ2k = ν for k = p, n, we may write the characteristic function of the difference of the two gamma processes as t/ν 1 . E exp iu(γ p (t) − γ n (t) ) = ν p νn νp νn 2 1 − iu µ − µ + u µ µ p

n

p

n

The result follows on comparing this characteristic function with that of the variance gamma process and defining the mean and variance rates of the two gamma processes to be differenced accordingly. Specifically ) 1 2 2σ 2 θ µp = + , θ + 2) ν 2 2 1 2 2σ θ θ + µn = − , 2 ν 2 ν p = µ2p ν, ν n = µ2n ν.

132

D. B. Madan

6.1.4 The L´evy density for the variance gamma process The L´evy density for the variance gamma process is easily constructed from its representation as the difference of two gamma processes using the well known form for the L´evy density of the gamma process. It follows that the L´evy density of the variance gamma process is µn 1 exp(− ν n |x|) for x < 0 ν |x| k X (x) = µp 1 exp(− ν p x) for x > 0. ν x The basic form of the L´evy density is that of a negative exponential scaled by the reciprocal of the jump size. Just as in the gamma process, the integral of the L´evy density is infinite and the process is therefore a finite variation process with infinite arrival rates of jumps. It is helpful to write the L´evy density in terms of the original parameters of the process and this leads to the expression # $ exp θ x/σ 2 2/ν + θ 2 /σ 2 |x| . exp − (9) k X (x) = ν |x| σ The special case of θ = 0 is a symmetric L´evy measure and hence the absence of skew. Negative values of θ give a fatter left tail and induce negative skewness. We also observe that as ν is increased the rate of exponential decay in the L´evy measure is reduced thus raising the arrival rate of jumps of the larger size. This induces the higher kurtosis related to this parameter. The two additional parameters therefore give direct control of the two moments that data analysis indicates we need to be able to control. 6.1.5 The return density for the variance gamma process The density of X (t; σ , ν, θ) is available in closed form and is derived in Madan, Carr and Chang (1998). This is a closed form, in that it is expressible in terms of the special functions of mathematics, in particular the modified Bessel function of the second kind. Specifically we have that the density of X (t) = x given X (0) = 0, h(x, t; σ , ν, θ) = h(x) is # . 2νt − 14 $ 2 2 exp θ x/σ 2 1 x2 2σ 2 +θ h(x) = K t −1 x2 . √ ν 2 σ2 ν ν t/ν 2πσ (t/ν) 2σ 2 /ν + θ 2 (10) There are three terms in the density, an exponential, a real power and the modified Bessel function. This is useful for maximum likelihood estimation of parameters from time series and it is also useful in providing density plots of results. Later

4. Purely Discontinuous Asset Price Processes

133

we report on closed forms for option prices and this incorporates a closed form for the cumulative distribution function as well, that may be used to determine critical values for extreme points in value at risk calculations.

6.2 The stock price process driven by a VG process We replace Brownian motion in the classical formulation of the geometric Brownian motion model by the VG process and define the risk neutral process for the stock price S(t) by t σ 2ν S(t) = S(0) exp r t + X (t; σ , ν, θ) + ln 1 − θν − (11) ν 2 where r is the constant continuously compounded interest rate. Observe from the characteristic function of the VG process that E exp(X (t)) = φ X (−i) νt 1 = 1 − θ ν − σ 2 ν/2 t σ 2ν = exp − ln 1 − θν − ν 2 and hence the mean rate of return on the stock, under the risk neutral process, is the interest rate by construction. We note further that the limit as ν tends to zero of ν1 ln(1 − θν − σ 2 ν/2) is by L’Hopital’s rule −θ − σ 2 /2 and so for small ν this term is −θ t − σ 2 t/2. Noting that X (t) = θ G(t) + σ W (G(t)) but for small ν, G(t) is essentially t, we get that σ2 )t + W (t) 2 or the familiar geometric Brownian motion model for the log of the stock price. Hence we have a generalization of the Black–Scholes and Merton models for the stock price. The generalization has introduced two new parameters ν, θ that we have observed give us control over skewness and kurtosis in the process. ln S(t) = ln S(0) + (r −

6.2.1 Characteristic function of the log of the stock price The characteristic function of the ln(S(t)) is easily derived from that of X (t), and is useful in deriving option prices by Fourier methods. Specifically we have that de f φ ln(S(t)) (u) = E exp (iu ln(S(t))) t σ 2ν = exp iu ln(S(0)) + r t + ln 1 − θν − φ X (t) (u) (12) ν 2

134

D. B. Madan

where φ X (t) (u) is the characteristic function of the VG process given in (8).

6.3 Variance gamma option pricing When the risk neutral process for the stock is described by the variance gamma process for the log of stock price as in equation (11), European call options on stock of strike K and maturity t have a price, c(S(0); K , t) that is given by evaluating the expected discounted cash flow c(S(0); K , t) = E e−r t max (S(t) − K , 0) . (13) This valuation result is an application of the defining property of a risk neutral probability, that traded asset prices, when discounted by the value of the money market account, are martingales under this probability. The valuation result follows on noting that option prices at maturity equal the promised payoff. The computation of the call price in equation (13) is accomplished in closed form in Madan, Carr and Chang (1998). Other approaches at efficient computation employ Fourier inversion as described in Bakshi and Madan (1998) or improvements thereof as explained in Carr and Madan (1998). We present here a brief summary of these results. The reader is referred to the original papers for further details.4 6.3.1 The Madan, Carr and Chang closed form The method employed by Madan, Carr and Chang (1998) to develop a closed form for the VG option price relies on integrating the Black–Scholes formula applied to a random gamma time, with respect to the gamma density for this time. This approach requires the explicit computation of expressions of the form γ −1 ∞ √ a u exp(−u) N √ +b u %(a, b, γ ) = du, (14) (γ ) u 0 where N (x) is the cumulative distribution function of the standard normal variate. The call option price can be explicitly computed in terms of this % function. Specifically we have that $ # ) ) 1 − c1 ν , (α + s) ,γ c(S(0); K , t) = S(0)% d ν 1 − c1 $ # ) ) 1 − c2 ν ,α ,γ − K exp(−r t)% d ν 1 − c2 4 Matlab programs are available for performing these computations in all the three ways described here.

4. Purely Discontinuous Asset Price Processes

135

where σ s=/ 2 1 + σθ

ν 2

θ α=− / 2 σ 1 + σθ t γ = ν ν(α + s)2 c1 = 2 να 2 c2 = 2 d=

ln

S(0) K

s

+ rt

ν 2

γ 1 − c1 . + ln s 1 − c2

A reduction of the % function (14) to the special functions of mathematics is accomplished in terms of the modified Bessel function of the second kind and the degenerate hypergeometric function of two variables with integral representation (Humbert (1920)) 1 (γ ) "(α, β, γ ; x, y) = u α−1 (1 − u)γ −α−1 (1 − ux)−β euy du. (α)(γ − α) 0 Explicitly we have that cγ + 2 exp (sign(a)c) (1 + u)γ √ %(a, b, γ ) = 2π(γ )γ 1

1+u , − sign(a)c(1 + u)) 2 1 cγ + 2 exp(sign(a)c)(1 + u)1+γ − sign(a) √ 2π (γ )(1 + γ ) 1+u , − sign(a)c(1 + u)) ×K γ − 1 (c)"(1 + γ , 1 − γ , 2 + γ ; 2 2 1 cγ + 2 exp (sign(a)c) (1 + u)γ + sign(a) √ 2π(γ )γ 1+u , − sign(a)c(1 + u) ×K γ − 1 (c)" γ , 1 − γ , 1 + γ ; 2 2 ×K γ + 1 (c)"(γ , 1 − γ , 1 + γ ; 2

where

c = |a| 2 + b2

136

D. B. Madan

b u=√ . 2 + b2 Madan, Carr and Chang (1998) go on to employ this closed form in a detailed study of the empirical properties of VG option pricing, noting in particular the importance of skewness from the risk neutral viewpoint, and the ability of the VG model to flatten the implied volatility smile in option pricing. 6.3.2 Inversion of distribution function transforms (Bakshi and Madan) Bakshi and Madan (1998) show that very generally one may write a call option price in the form c(S(0); K , t) = S(0))1 − K exp(−r t))2 where )1 and )2 are complementary distribution functions obtained on computing the integrals e−iuk φ ln(S(t)) (u − i) 1 ∞ 1 du Re )1 = + 2 π 0 iuφ ln(S(t)) (−i) e−iuk φ ln(S(t)) (u) 1 ∞ 1 du )2 = + Re 2 π 0 iu where k = ln(K ) and φ ln(S(t)) (u) is the characteristic function of the log of the stock price given in this case by (12). Bakshi and Madan (2000) study the general spanning properties of the characteristic functions and their relationship to the spanning properties of options. They also express the general relationships between the two probability elements in option pricing providing a discussion of cases where they are analytically linked in their transforms. 6.3.3 Inversion of the modified call price (Carr and Madan) Carr and Madan (1998) define the Fourier transform of the modified call price by ∞ ψ(v) = eivk+αk c(S(0); ek , t)dk −∞

where k = ln(K ), and the multiplication by exp(αk) for α > 0 dampens the call price for negative values of log strike. They show generally that ψ(v) =

e−r t φ ln(S(t)) (v − (α + 1)i) . α 2 + α − v 2 + i(2α + 1)v

4. Purely Discontinuous Asset Price Processes

137

The call option price may then be obtained on a single Fourier inversion of ψ that may also employ the fast Fourier transform to evaluate exp(−αk) ∞ −ivk e ψ(v)dv. c(S(0); K , t) = π 0 Carr and Madan (1998) also consider other strategies for speeding up the pricing of options using the characteristic function of the log of the stock price, and the methods should be useful for a variety of L´evy processes.

6.4 Results on option pricing performance The variance gamma option pricing model was tested in Madan, Carr and Chang (1998) on data for S&P 500 options for the period January 1992 to September 1994. It was noted there that the skew is significant and the three parameter process effectively eliminates the smile in option prices in the direction of moneyness. The pricing errors are generally between 1 and 3 percent for options on the relatively liquid stocks and indices. The maturities we work with get fairly small and are as low as a couple of days at times, while the range of strikes are quite wide and may be up to 20 to 30% out-of-the-money. Yet on this wide range of strikes and low maturities the model provides adequate fits. Here we provide some illustrations of the results for options on the SPX and Nikkei indices. Figures 1 and 2 provide graphs of the prices of out-of-the-money options on these two indices along with the theoretical price curve as fit by the VG model. For strikes above at-the-money the options are calls while puts are used for the strikes below the spot. The typical V shaped price structure observed in markets is basically consistent with that of the negative exponential in the absolute value of the size of the move, that is the local structure of the VG model. The difficulty for Gaussian based models is precisely the fact that for these models option prices of out-of-the-money options fall off too rapidly, being a negative exponential in the square of the move, compared to market. We observe here that the essential structure of price decay is consistent with the building block of completely monotone L´evy densities, the double negative exponential.

7 Asset allocation in L´evy systems Apart from the successes of L´evy processes in option pricing, and the V G model in particular, these processes are associated with financial markets that are incomplete with respect to dynamic trading in the stock and the money market account. In such economies, with stock prices driven by an infinite arrival finite variation L´evy process, European options are market completing assets and one may study the

138

D. B. Madan

Fig. 1. Out-of-the-money option prices on the SPX index and the price curve as fit by the VG model.

Fig. 2. Out-of-the-money option prices on the Nikkei Index and the price curve fit by the VG model.

4. Purely Discontinuous Asset Price Processes

139

question of the optimal demand for these assets by investors. In contrast, for the traditional economy, where options are redundant assets there is no demand for these assets. With these observations in mind, Carr, Jin and Madan (2000) proceed to reformulate the Merton problem for optimal consumption and investment, except now the asset space is genuinely expanded to include all the European options on the stock of all strikes and maturities as well. They study the problem of optimal derivative investment and solve it in closed form for HARA utility when the statistical and risk neutral price processes are in the VG class of processes. They also show that the shape of the optimal financial derivative product is independent of preferences, time horizons and the mean rate of return on the stock, factors that influence the level of investor demand but not the shape. The latter depends primarily on the comparison between the prices of market moves and the relative frequency of their occurence. Their analysis also suggests that demand would be highest for at-the-money low maturity options in such economies, a fact that is in accord with casual market observations.

7.1 Optimal derivative investment Consider an economy trading a stock with price process S(t) that is a homogeneous L´evy process in the interval [0, ϒ] with a L´evy density k P (x) defined over the real line where x represents the jumps in the log of the stock price. An example is provided by the VG process of equation (11). Also trading in the economy are options on this stock with strikes K > 0 and maturities T < ϒ. The prices of these options are given by the processes c(S(t); K , T ) for t < T where these prices are consistent with the absence of arbitrage and are derived in line with martingale pricing methods using the risk neutral measure that is also a homogeneous L´evy process with L´evy density k Q (x). The subscripts P and Q make the important distinction between the statistical price process and the risk neutral process, with the former assessing the relative frequency of events while the latter assesses their prices. In such an economy we wish to study the question of optimal derivative investment. At first glance, and in analogy with the solution methods adopted in Merton (1971) this is a particularly difficult problem that is not going to be tractable from an analytical perspective. This is because we ask for the optimal positions in a doubly indexed continuum of assets, viz. the options of all strikes K > 0 and maturities T > t in a context in which many of these options (i.e. those with maturities below t) are expiring on us. Furthermore, the analytical pricing of these options is generally a complex exercise reflecting all the difficulties associated with the kinked option payoff.

140

D. B. Madan

For reasons of tractability, we reformulate the problem with the focus on the real uncertainty which is the jump in log price of the stock, x. We view investment, not as a decision on what assets to hold, but in the first instance as a design problem where the investor wishes to design the optimal response of his or her wealth to market moves represented by x. Hence we seek to determine the optimal wealth response function w(x, u) which is the jump in the investor’s log wealth if the market were to jump at time u by the amount x in the log price of the stock. The actual investment in options that delivers this optimal wealth response is a secondary problem that may be solved numerically using the spanning properties of options. The structure and solution of this secondary problem is described in further detail in Carr, Jin and Madan (2000). From the perspective of the optimal design of wealth responses, the optimal derivative investment problem may be formulated as a Markov control problem. Carr, Jin and Madan (2000) consider both the infinite time horizon problem with intermediate consumption and the finite horizon problem with no intermediate consumption. Here we present just the former. We denote by c(t) the path of the flow rate of consumption per unit time and suppose the investor has a preference ordering over consumption paths represented by expected utility evaluated as ! ∞ P exp(−βs)U (c(s))ds (15) u=E 0

where P is the statistical probability measure, β is the pure rate of time preference, and U (c) is the instantaneous utility function. The investor wishes to choose the consumption path c(·) and the wealth response design w(·) with a view to maximizing u. The investor is constrained by his budget constraint that describes the evolution of his wealth. The wealth, W (t), transition equation is the integral equation t t r W (s )ds − c(s)ds (16) W (t) = W (0) + 0 0 t ∞ + W (s ) ew(x,u) − 1 m(ω; d x, ds) − k Q (x)d xds , 0

−∞

and the budget constraint requires that the wealth process be non-negative, W (t) ≥ 0 almost surely. The first two terms of the wealth transition are standard and require no explanation, accounting for interest earnings and the financing of the consumption stream. The final term involves integration with respect to two measures, the first is the integer valued random measure m(ω; d x, ds) that is a Dirac delta measure counting the jumps that occur at various times of various sizes. The second is the pricing L´evy measure k Q (x)d xds. The integration with respect to m accounts for the wealth changes actually experienced by the response design

4. Purely Discontinuous Asset Price Processes

141

w(x, u). The integration with respect to k Q (x)d xds accounts for the cost of this wealth response access that must be paid for through time. The wealth transition equation (16) may be rewritten in a form more directly comparable to Merton’s original equation by writing t t r W (s )ds − c(s)ds (17) W (t) = W (0) + 0 0 t ∞ + W (s ) ew(x,u) − 1 k P (x)d xds − k Q (x)d xds 0 −∞ t ∞ W (s ) ew(x,u) − 1 (m(ω; d x, ds) − k P (x)d xds) + 0

−∞

where we have just added and subtracted the integral of the wealth change with respect to the measure k P (x)d xds. In this formulation the final integral in equation (17) is a martingale under the statistical measure P and matches the term representing the martingale component of stock investment in Merton (1971). The first two terms are the same as in Merton (1971). The third term matches the term that evaluates excess returns from stock investment in Merton (1971). Here excess returns are the expected wealth change less the cost or price of this change whereas in Merton we have µ − r. The investor’s optimal derivative investment problem is to choose c(·), w(·), with a view to maximizing the utility u of equation (15) subject to the budget constraint of equation (16).

7.2 Optimal design of wealth responses Let J (W ) be the optimized expected utility when the initial wealth W (0) = W. It is shown in Carr, Jin and Madan (2000) that the optimal wealth response function for the infinite time horizon problem is homogeneous in time and satisfies the equation k Q (x) JW (W ew(x) ) = . JW (W ) k P (x)

(18)

This condition has an intuitive interpretation when it is rewritten as JW (W ew(x) )k P (x) = JW (W ) k Q (x) which is that the expected marginal utility per initial dollar spent on cash in each state, x, is equalized across states. If this is not the case then w(x) should be altered to move funds from states with a lower marginal utility to states with a higher marginal utility. Alternatively, the marginal rate of transformation in utility

142

D. B. Madan

between two states must equal the marginal rate of transformation in markets between the same two states. The optimal wealth response w(x), is then determined from equation (18), if we know the function J (W ) as k Q (x) −1 w(x) = JW JW (W ) . k P (x) We learn from this representation that the optimal wealth response design is a possibly smooth function JW−1 applied to the ratio of two finite variation, infinite arrival rate L´evy measures. Such L´evy measures are kinked by construction at zero where the arrival rate goes to infinity. It follows that one would expect to see this property inherited by w(x). This has the implication that at a minimum, optimal wealth response design positions investors with different slopes of their desired wealths with respect to up and down market movements, from at-the-money. Equivalently, there is a demand for short maturity at-the-money options. 7.2.1 HARA VG financial products In the special case when the statistical and risk neutral processes are in the VG class and the utility function U (c) is in the HARA (hyperbolic absolute risk aversion) class of utility functions, the optimal derivative investment problem of section 7.1 is shown in Carr, Jin and Madan (2000) to have a closed form solution where J (W ) is also in the HARA class of utility functions. The kinks in optimal designs discussed generally in section 7.2 can now be explicitly computed for this case. Specifically, suppose the statistical L´evy measure is symmetric and given by # ) $ 2 |x| 1 exp − (19) k P (x) = κ |x| κ s where κ is the volatility of the statistical gamma time change for a symmetric Brownian motion with volatility s. Further suppose that the risk neutral L´evy measure is as given by (9) and parameters σ , ν, and θ. Let the utility function be 1−γ α γ c− A . U (c) = 1−γ γ In this case, defining θ σ2 . ) 1 2 θ2 1 2 − + 2 λ= s κ σ ν σ

ζ=

4. Purely Discontinuous Asset Price Processes

143

Fig. 3. Optimal spot slides in the presence of excess risk neutral kurtosis and skew.

and letting R denote the price relative of asset price post jump to its pre jump value, then the optimal product takes the form " f (R) =

R− R

ζ +λ γ

− ζ −λ γ

for R > 1 for R < 1.

(20)

and the kink at-the-money is present unless λ = 0. The shape of this product is independent of the floor of the utility function and depends primarily on the statistical and risk neutral L´evy measures and risk aversion as represented by γ . We also observe the clear impact of risk aversion on optimal product design. As we raise γ , the effect on this on the optimal wealth response f (R) is to flatten out the movement in the optimal wealth response and to let the payoff approach that of a bond, thereby reflecting a lack of tolerance for movements in wealth. A variety of possible shapes can arise for the optimal product and these are illustrated in Figures 3–6 for a variety of settings on the statistical and risk neutral parameters. Each figure reports three curves, for varying levels of risk aversion (RRA) and the flattening out of the response as we raise risk aversion is apparent in each case. Since these graphs draw optimal portfolio values against the level of the spot asset they are referred to as spot slides.

144

D. B. Madan

Fig. 4. Optimal spot slide for a strong skew and a mild excess kurtosis.

In Figure 3 the excess risk neutral kurtosis and skew leads to large moves being priced high relative to their likelihood and hence the optimal spot slide shorts these events and we have an inverted V shape for the spot slide. For Figure 4 the skew is strong and the kurtosis is mild. This leads to falls being overpriced while rises are underpriced. The optimal slide is basically long the asset, but the positioning with respect to rises, the up delta, and falls, the down delta, differ. For Figure 5 we have an excess statistical volatility making large moves relatively cheap securities. This gives rise to the V shaped optimal position. Figure 6 is a reverse of the situation of Figure 4. The direction of the skew has been reversed and leads to a basically short position, with the kink induced by the behavior of the L´evy densities at the origin.

8 Spot slide calibration and position measures The inputs for constructing an optimal spot slide are fairly simple and require just the specification of the statistical or time series moments of the return distribution, from which one may infer κ and s, the statistical L´evy measure parameters. The next step is to obtain data on market option prices, preferably for short

4. Purely Discontinuous Asset Price Processes

145

Fig. 5. Optimal spot slide when statistical volatility dominates risk neutral volatility

Fig. 6. Optimal spot slide for a positive skewness.

146

D. B. Madan

Fig. 7. Optimal spot slide as calibrated to a book of derivatives on an index.

maturity options and then to estimate the risk neutral L´evy measure and the three parameters σ , ν and θ. Finally, making some assumption on the coefficient of relative risk aversion in a power utility function gives us γ and we are ready to graph the optimal spot slide describing how one should currently be positioned in the derivatives markets. For a contrast, one may compare with the actual spot slide that aggregates a trader’s derivatives book and draws the response curve of his book value to market moves. We present here the results of calibrating optimal spot slides to data on actual spot slides. In the calibration we allowed for a reverse engineering of the coefficient of risk aversion γ as there is no other way to estimate this quantity. However, we also observed that the risk neutral excess kurtosis ν is typically an order of magnitude above its statistical counterpart κ and so we allowed this entity to be reverse engineered as well. Such an approach is defensible on noting that the variance of kurtosis estimates are of the order of the eighth moment and as the time series involved are not very long, generally two to four years, there is some leeway in an appropriate choice of this magnitude. The other parameters, σ , ν, θ , and s are taken at their estimated values. For a variety of underlying assets and on a number of days, we reverse engineered the values of γ and κ so as to match the optimal spot slide with the actual spot slide observed for that day. Remarkably, we were able in many cases to come close to actual spot slides by just a simple choice of these two parameters (γ , κ).

4. Purely Discontinuous Asset Price Processes

147

Figure 7 presents an example of an optimal spot slide as calibrated to an actual spot slide on a book of derivatives on a index. The ratio of κ to ν is referred to as β in the graph and describes the relative excess kurtosis of the subjective and risk neutral densities. Though it is often fairly small when calibrated, it is often an order of magnitude above the ratio of the statistical excess kurtosis to the risk neutral excess kurtosis. Once all these parameters have been estimated and importantly γ and κ have been inferred from data on the actual spot slide, one may infer a personalized risk neutral density given by the subjective L´evy measure, determined by the parameters s and κ as described by equation (19), that is transformed by the marginal utility process as described in Madan and Milne (1991) to obtain the personalized risk neutral L´evy measure, k I (x) (the subscript I being indicative of an individualized measure) $ # ) 2 |x| 1 . (21) k I (x) = exp (−γ x) exp − κ |x| κ s The L´evy measure (21) is that of a VG process with personalized values for σ I , ν I , θ I given by s κν σI = / 2 2 1 − γ 2s κ θI

= −γ

νI

= κ.

s2 κ ν 1 − γ 2s2κ 2

(22)

We thus infer a personalized risk neutral process and this may be employed to construct a personalized return density that we term a position measure, as it is reverse engineered from derivative positions being viewed as optimal and therefore reflects preferences and beliefs that are obtained by a revealed preference exercise. All three densities are in the VG class of processes. On completing this reverse engineering task we have available a statistical return density estimated from the time series of the return data, a risk neutral density as inferred from options data, and a position density as reverse engineered from the actual spot slide of the derivatives book. Figures 8, 9, 10 and 11 present a range of samples of graphs of these densities on a variety of underlying assets. We observe a fairly diverse set of shapes of the densities, with varying degrees of skewness and kurtosis as reflected in the size of tails on the left and the right of the distribution. Furthermore, generally the position density is closer to the statistical density than the risk neutral density, reflecting the view that traders

148

D. B. Madan

Fig. 8. Statistical, risk neutral and position densities for the SPX.

Fig. 9. Statistical, risk neutral and position densities for RUT.

4. Purely Discontinuous Asset Price Processes

Fig. 10. Statistical, risk neutral and position densities for the MSH.

Fig. 11. Statistical, risk neutral and position densities for the DRG.

149

150

D. B. Madan

respect probability calculation as inferred from time series, and position themselves accordingly given the market prices of market moves as reflected in the risk neutral distribution. Occasionally, however, as in the case of Figure 9 the position density may be skewed further to the left than even the risk neutral density and is reflective of greater risk aversion on the part of the trader than is prevalent in the market.

9 Conclusion We argue here that empirical evidence on the statistical and risk neutral price processes for financial assets belong to the class of purely discontinuous processes of finite variation, albeit ones of high activity, as reflected by an infinite arrival rate of jumps. Structurally, the pattern of jump arrival rates is consistent with the hypothesis of complete monotonicity whereby arrival rates at smaller size levels are higher. Economic considerations of the absence of arbitrage point in the same direction by demonstrating that semimartingales, the candidate no arbitrage price process, is a time changed Brownian motion and the increasing random process of the time change is of necessity purely discontinuous, if it is not locally deterministic. The attribute of finite variation is attractive from two perspectives, one that allows a separation of the up and down tick modeling of the market, and we offer two representations of such price processes that are related under complete monotonicity of the L´evy density. The second attractive feature of finite variation is its robustness as reflected in its tolerance of parametric heterogeneity without the resulting measures being singular or disjoint in their sets of almost sure outcomes. This lack of robustness is an inherent property of infinite variation processes and we strongly advocate against the use of these processes as models for the price process unless there is overwhelming evidence in support of such a choice. The class of stationary processes of independent and identically distributed increments meeting our requirements are characterized as a subclass of L´evy processes. Within this class, an important and analytically rich example is provided by Brownian motion time changed by a gamma process that combines in an interesting way two well studied processes in their own right. We summarize the properties of the resulting process termed the variance gamma process. The process has two additional parameters that enable it combat skew and kurtosis. Option pricing under the variance gamma process is tractable using a variety of methods and we outline three such methods. The first is a closed form in terms of the modified Bessel function of the second kind and the degenerate hypergeometric function of two variables. The second involves two Fourier inversions for the complementary distribution function and the third employs direct Fourier inversion for the call price using the fast Fourier transform. The results of estimations are

4. Purely Discontinuous Asset Price Processes

151

illustrated for data on SPX and Nikkei Index options. It is observed that the model eliminates the smile in the strike direction, using effectively for this purpose its two additional parameters. Infinite arrival rate, finite variation, L´evy processes with completely monotone L´evy densities are processes for the stock price for which options are market completing assets that are part of the primary assets of the economy with a genuine demand for these assets by investors. We study the Merton problem of optimal consumption and investment with the asset space expanded to include out-of-the-money European options as investment vehicles. For HARA utility and VG statistical and risk neutral processes this problem is solved in closed form with optimal portfolios that are kinked at-the-money and display a different slope with respect to upward and downward movements of the market. The positions reflect a role for at-the-money short maturity options, the most liquid end of the options market in practice. Using our theory of optimal derivative positioning we illustrate how one may reverse engineer the preferences and beliefs of traders from observed spot slides of the derivatives book. This allows us to infer personalized risk neutral densities from observations on positions and we term this density the position density. Illustrations are provided, for comparative purposes of the statistical, risk neutral and position densities. It is observed that position densities are generally closer to the statistical density and lie between the statistical and risk neutral densities. At times however, they may be more skewed than the risk neutral density reflecting risk aversion that dominates market risk aversion.

Acknowledgment I would like to thank all my co-authors for all the hard work on the various aspects of this project. They are in approximate chronological order, Eugene Seneta, Frank Milne, Eric Chang, Peter Carr, Helyette Geman, Marc Yor and Gurdip Bakshi. The support and encouragement offered by Claudia Albanese, Marco Avellanada, Joseph Cherian, Carl Chiarella, Jaksa Cvitani´c, Nicole El Karoui, Hans F¨ollmer, Robert Jarrow, Yuri Kabanov, Ioannis Karatzas, Vadim Linetsky, Vincent Lacoste, Eckhardt Platen, Marc Pinsky, Stan Pliska, Phillip Protter, Raymond Rishel, Martin Schweizer, Steve Shreve, Met´e Soner, and Thaleia Zariphopoulou is also greatly appreciated. Finally I would like to acknowledge the assistance and guidance I have received from my co-workers at Morgan Stanley Dean Witter, they are Doug Bonard, Steven Chung, Georges Courtadon, Peter Fraenkel, Santiago Garcia, George George, Kevin Holley, Ajay Khanna, Harry Mendell, and Lisa Polsky. Any remaining errors are solely my responsibility.

152

D. B. Madan

References Bakshi, G. and Chen, Z. (1997), An alternative valuation model for contingent claims, Journal of Financial Economics 44, 123–65. Bakshi, G. and Madan, D.B. (2000), What is the probability of a stock market crash, Working Paper, University of Maryland. Bakshi, G. and Madan, D.B. (1998), Spanning and derivative security valuation, Journal of Financial Economics 55, 205–38. Bates, D. (1996), Jumps and stochastic volatility: exchange rate processes implicit in Deutschmark options, The Review of Financial Studies 9, 69–108. Bertoin, J. (1996), L´evy Processes, Cambridge University Press, Cambridge. Breeden, D. and Litzenberger, R. (1978), Prices of state contingent claims implicit in option prices, Journal of Business 51, 621–51. Black, F. and Scholes, M. (1973), The pricing of options and corporate liabilities, Journal of Political Economy 81, 637–54. Carr, P., Geman, H., Madan, D.B and Yor, M. (2000), The fine structure of asset returns: an empirical investigation, forthcoming in the Journal of Business. Carr, P., Jin, X. and Madan, D.B. (2000), Optimal investment in derivative securities, forthcoming in Finance and Stochastics. Carr, P. and Madan, D.B. (1999), Option valuation using the fast Fourier transform, Journal of Computational Finance 4, 61–73. Cox, J.C., Ingersoll, J.E. and Ross, S.A. (1985), A theory of the term structure of interest rates, Econometrica 53, 385–408. Cox, J. and Ross, S.A. (1976), The valuation of options for alternative stochastic processes, Journal of Financial Economics 3, 145–66. Das, S. and Foresi, S. (1996), Exact solutions for bond and options prices with systematic jump risk, Review of Derivatives Research 1, 7–24. Delbaen, F. and Schachermayer, W. (1994), A general version of the fundamental theorem of asset pricing, Mathematische Annalen 300, 520–63. Derman, E. and Kani, I. (1994), Riding on a smile, Risk 7, 32–9. Dupire, B. (1994), Pricing with a smile, Risk 7, 18–20. Embrechts, P. Kluppelberg, C. and Mikosch, T. (1997), Modeling Extremal Events, Springer-Verlag, Berlin. Fama, E.F. (1965), The behavior of stock market prices, Journal of Business 38, 34–105. Feller, W.E. (1971), An Introduction to Probability Theory and its Applications, 2nd edition, Wiley, New York. Geman, H., Madan, D.B. and Yor, M. (2000), Time changes for L´evy processes, forthcoming in Mathematical Finance. Harrison, J.M. and Kreps, D. (1979), Martingales and arbitrage in multiperiod securities markets, Journal of Economic Theory 20, 381–408. Harrison, J.M. and Pliska, S.R. (1981), Martingales and stochastic integrals in the theory of continuous trading, Stochastic Processes and Their Applications 11, 215–60. Heston, S.L. (1993), A closed-form solution for options with stochastic volatility with applications to bond and currency options, The Review of Financial Studies 6, 327–43. Hull, J. and White, A. (1987), The pricing of options on assets with stochastic volatility, Journal of Finance 42, 281–300. Humbert, P. (1920), The confluent hypergeometric functions of two variables, Proceedings of the Royal Society of Edinburgh 73–85. Jacod, J. and Shiryaev, A. (1998), Local martingales and the fundamental asset pricing theorems in the discrete-time case, Finance and Stochastics 3, 259–73.

4. Purely Discontinuous Asset Price Processes

153

Jacod, J. and Shiryaev, A. (1980), Limit Theorems for Stochastic Processes, Springer-Verlag, Berlin. Jarrow, R.A. and Madan, D. (2000), Martingales and private monetary values, forthcoming in Journal of Risk. Kreps, D. (1981), Arbitrage and equilibrium in economies with infinitely many commodities, Journal of Mathematical Economics 8, 15–35. Madan, D.B., Carr, P. and Chang, E. (1998), The variance gamma process and option pricing, European Finance Review 2, 79–105. Madan D.B. and Milne, F. (1991), Option pricing with VG martingale components, Mathematical Finance 1, 39–55. Madan, D.B. and Seneta, E. (1989), Characteristic function estimation using maximum likelihood on transformed variables, Journal of the Royal Statistical Society ser. B, 51, 281–5. Madan, D.B. and Seneta, E. (1990), The variance gamma (V.G.) model for share market returns, Journal of Business 63, 511–24. Merton, R.C. (1971), Optimum consumption and portfolio rules in a continuous time model, Journal of Economic Theory 3, 373–413. Merton, R.C. (1973), Theory of rational option pricing, Bell Journal of Economics and Management Science 4, 141–83. Merton, R.C. (1976), Option pricing when underlying stock returns are discontinuous, Journal of Financial Economics 3, 125–44. Monroe, I. (1978), Processes that can be embedded in Brownian motion, The Annals of Probability 6, 42–56. Naik, V. and Lee, M. (1990), General equilibrium pricing of options on the market portfolio with discontinuous returns, The Review of Financial Studies 3, 493–522. Press, J.S. (1967), A compound events model for security prices, Journal of Business 40, 317–35. Revuz, D. and Yor, M. (1994), Continuous Martingales and Brownian Motion, Springer-Verlag, Berlin. Rogers, C. (1997), Arbitrage with fractional Brownian motion, Mathematical Finance 7, 95–105 Ross, S.A. (1976a), Options and efficiency, Quarterly Journal of Economics 90, 75–89. Ross, S.A. (1976b), Arbitrage theory of capital asset pricing, Journal of Economic Theory 13, 341–60.

5 Latent Variable Models for Stochastic Discount Factors ´ Renault Ren´e Garcia and Eric

1 Introduction Latent variable models in finance have traditionally been used in asset pricing theory and in time series analysis. In asset pricing models, a factor structure is imposed on a collection of asset returns to describe their joint distribution at a point in time, while in time series, the dynamic behavior of a series of multivariate returns depends on common factors for which a time series process is assumed. In both cases, the fundamental role of factors is to reduce the number of correlations between a large set of variables. In the first case, the dimension reduction is cross-sectional, in the second longitudinal. Factor analysis postulates that there exists a number of unobserved common factors or latent variables which explain observed correlations. To reduce dimension, a conditional independence is assumed between the observed variables given the common factors. Arbitrage pricing theory (APT) is the standard financial model where returns of an infinite sequence of risky assets with a positive definite variance–covariance matrix are assumed to depend linearly on a set of common factors and on idiosyncratic residuals. Statistically, the returns are mutually independent given the factors. Economically, the idiosyncratic risk can be diversified away to arrive at an approximate linear beta pricing: the expected return of a risky asset in excess of a risk-free asset is equal to the scalar product of the vector of asset risks, as measured by the factor betas, with the corresponding vector of prices for the risk factors. The latent GARCH factor model of Diebold and Nerlove (1989) best illustrates the type of time series model used to characterize the dynamic behavior of a set of financial returns. All returns are assumed to depend on a common latent factor and on noise. A longitudinal dimension reduction is achieved by assuming that the factor captures and subsumes the dynamic behavior of returns.1 The imposed 1 A cross-sectional dimension reduction is also achieved if the variance–covariance matrix of residuals is

assumed to be diagonal.

154

5. Latent Variable Models for SDFs

155

statistical structure is a conditional absence of correlation between the factor and the noise terms, given the whole past of the factor and the noise, while the conditional variance of the factor follows a GARCH structure. This autoregressive conditional variance structure is important for financial applications such as portfolio allocations or value-at-risk calculations. In this chapter, we aim at providing a unifying analysis of these two strands of literature through the concept of stochastic discount factor (SDF). The SDF (m t+1 ), also called pricing kernel, discounts future payoffs pt+1 to determine the current price π t of assets: π t = E[m t+1 pt+1 |Jt ],

(1.1)

conditionally to the information set at time t, Jt . We summarize in Section 2 the mathematics of the SDF in a conditional setting according to Hansen and Richard (1987). Practical implementation of an asset pricing formula like (1.1) requires a statistical model to characterize the joint probability distribution of (m t+1 , pt+1 ) given Jt . We specify in Section 3 a dynamic statistical framework to condition the discounted payoffs on a vector of state variables. Assumptions are made on the joint probability distribution of the SDF, asset payoffs and state variables to provide a state-space modeling framework which extends standard models. Beta pricing relations amount to characterizing a vector space basis for the SDF through a limited number of factors. The coefficients of the SDF with respect to the factors are specified as deterministic functions of the state variables. Factor analysis and beta pricing with conditioning on state variables are reviewed in Section 4. In dynamic asset pricing models, one can distinguish between reduced-form time-series models such as conditionally heteroskedastic factor models and asset pricing models based on equilibrium. We propose in Section 5 an intertemporal asset pricing model based on a conditioning on state variables which includes as a particular case stochastic volatility models. In this respect, we stress the importance of timing in conditioning to generate instantaneous correlation effects called leverage effects and show how it affects the pricing of stocks, bonds and European options. We make precise how this general model with latent variables relates to standard models such as CAPM for stocks and Black and Scholes (1973) or Hull and White (1987) for options.

2 Stochastic discount factors and conditioning information Since Harrison and Kreps (1979) and Chamberlain and Rothschild (1983), it is well-known that, when asset markets are frictionless, portfolio prices can be characterized as a linear valuation functional that assigns prices to the portfolio payoffs.

156

´ Renault R. Garcia and E.

Hansen and Richard (1987) analyze asset pricing functions in the presence of conditioning information. Their main contribution is to show that these pricing functions can be represented using random variables included in the collection of payoffs from portfolios. In this section we summarize the mathematics of a stochastic discount factor in a conditional setting following Hansen and Richard (1987). We focus on one-period securities as in their original analysis. In the next section, we will provide an extended framework with state variables to accommodate multiperiod securities. We start with a probability space (, A, P). We denote the conditioning information as the information available to economic agents at date t by Jt , a sub-sigma algebra of A. Agents form portfolios of assets based on this information, which includes in particular the prices of these assets. A one-period security purchased at time t has a payoff p at time (t + 1). For such securities, an asset pricing model π t (·) defines for the elements p of a set Pt+1 ⊂ Jt+1 of payoffs a price π t ( p) ∈ Jt . The payoff space includes the payoffs of primitive assets, but investors can also create new payoffs by forming portfolios. Assumption 2.1 (Portfolio formation) p1 , p2 ∈ Pt+1 5⇒ w1 p1 + w2 p2 ∈ Pt+1 for any variables w1 , w2 ∈ Jt . Since we always maintain a finite-variance assumption for asset payoffs, Pt+1 is, by virtue of Assumption 2.1, a pre-Hilbertian vectorial space included in: + Pt+1 = { p ∈ Jt+1 ; E[ p 2 |Jt ] < +∞}

which is endowed with the conditional scalar product: . p1 , p2 / Jt = E[ p1 p2 |Jt ].

(2.1)

The pricing functional π t (·) is assumed to be linear on the vectorial space Pt+1 of payoffs; this is basically the standard “law of one price” assumption, that is a very weak version of a condition of no-arbitrage. Assumption 2.2 (Law of one price) For any p1 and p2 in Pt+1 and any w1 , w2 ∈ Jt : π (w1 p1 + w2 p2 ) = w1 π ( p1 ) + w2 π( p2 ). The Hilbertian structure (2.1) will be used for orthogonal projections on the set Pt+1 of admissible payoffs both in the proof of Theorem 2.3 below (a conditional version of the Riesz representation theorem) and in Section 4. Of course, this implies that we maintain an assumption of closedness for Pt+1 . Indeed, Assumption 2.2 can be extended to an infinite series of payoffs to ensure not only a property of

5. Latent Variable Models for SDFs

157

closedness for Pt+1 but also a continuity property for π t (·) on Pt+1 with appropriate notions of convergence for both prices and payoffs. With these assumptions and a technical condition ensuring the existence of a payoff with nonzero price to rule out trivial pricing functions, one can state the fundamental theorem of Hansen and Richard (1987), which is a conditional extension of the Riesz representation theorem. Theorem 2.3 There exists a unique payoff p∗ in Pt+1 that satisfies: (i) π t ( p) = E[ p ∗ p|Jt ] for all p in Pt+1 ; (ii) P[E[ p ∗2 |Jt ] > 0] = 1. In other words, the particular payoff which is used to characterize any asset price is almost surely nonzero. With an additional no-arbitrage condition, it can be shown to be almost surely positive. 3 Conditioning the discounted payoffs on state variables We just stated that, given the law of one price, a pricing function π t (·) for a conditional linear space Pt+1 of payoffs can be represented by a particular payoff p∗ such that condition (i) of Theorem 2.3 is fulfilled. In this section, we do not focus on the interpretation of the stochastic discount factor as a particular payoff. Instead, we consider a time series (m t+1 )t≥1 of admissible SDFs or pricing kernels, which means that, at each date t, m t+1 belongs to the set Mt+1 defined as: + ; π t ( pt+1 ) = E t [m t+1 pt+1 |Jt ], ∀ pt+1 ∈ Pt+1 }. Mt+1 = {m t+1 ∈ Pt+1

(3.1)

For a given asset, we will write the asset pricing formula as: π t = E[m t+1 pt+1 |Jt ].

(3.2)

For the implementation of such a pricing formula, we need to model the joint probability distribution of (m t+1 , pt+1 ) given Jt . To do this, we will stress the usefulness of factors and state variables. We will suppose without loss of generality2 that the future payoff is the future price of the asset itself π t+1 . The problem is therefore to find the pricing function ϕ t (Jt ) such that: ψ t (Jt ) = E[m t+1 ψ t (Jt+1 )|Jt ].

(3.3)

Both factors and state variables are useful to reduce the dimension of the problem to be solved in (3.3). To see this, one can decompose the information Jt into three types of variables. First, one can include asset-specific variables denoted Yt , which 2 As usual, if there are dividends or other cashflows, they may be included in the price by a convenient

discounted sum. We will abandon this convenient expositional shortcut when we refer to more specific assets in subsequent sections.

158

´ Renault R. Garcia and E.

should contain at least the price π t . Dividends as well as other variables which may help characterize m t+1 could be included without really complicating matters. Second, the information will contain a vectorial process Ft of factors. Such factors could be suggested by economic theory or chosen purely on statistical grounds. For example, in equilibrium models, a factor could be the consumption growth process. In factor models, they could be observable macroeconomic indicators or latent factors to be extracted from a universe of asset returns. In both cases these variables are viewed as explanatory factors, possibly latent, of the collection of asset prices at time t. The purpose of these factors is to reduce the cross-sectional dimension of the collection of assets. Third, it is worthwhile to introduce a vectorial process Ut of exogenous state variables in order to achieve a longitudinal reduction of dimension. Two assumptions are made about the conditional probability distribution of (Yt , Ft )1≤t≤T knowing U1T = (Ut )1≤t≤T (for any T -tuplet t = 1, . . . , T of dates of interest) to support the claim that the processes making up Ut summarize the dynamics of the processes (Yt , Ft ). First we assume that the state variables subsume all temporal links between the variables of interest. Assumption 3.1 The pairs (Yt , Ft )1≤t≤T , t = 1, . . . , T are mutually independent knowing U1T = (Ut )1≤t≤T . According to the standard latent factor analysis terminology, Assumption 3.1. means that the TH variables Ut ∈ R H , t = 1, . . . , T provide a complete system of factors to account for the relationships between the variables (Yt , Ft )1≤t≤T (see for example Bartholomew (1987), p. 5). In the original latent variable modeling of Burt (1941) and Spearman (1927) in the early part of the century to study human intelligence, Yt represented an individual’s score to the test number t of mental ability. The basic idea was that individual scores at various tests will become independent (with repeated observations on several human subjects) given a latent factor called general intelligence. In our modeling, t denotes a date. When, with only one observation of the path of (Yt , Ft ), t = 1, . . . , T , we assume that these variables become independent given some latent state variables, it is clear that we also have in mind a standard temporal structure which provides an empirical content to this assumption. A minimal structure to impose is the natural assumption that only past and present values Uτ , τ = 1, 2, . . . , t of the state variables matter for characterizing the probability distribution of (Yt , Ft ). Assumption 3.2 The conditional probability distribution of (Yt , Ft ) given U1T = (Ut )1≤t≤T coincides, for any t = 1, . . . , T , with the conditional probability distribution given U1t = (Uτ )1≤τ ≤t .

5. Latent Variable Models for SDFs

159

Assumption 3.2. is the following conditional independence3 property assumption: T )|(U1t ) (Yt , Ft )6(Ut+1

(3.4)

for any t = 1, . . . , T . Property (3.4) coincides with the definition of noncausality by Sims (1972) insofar as Assumption 3.1. is maintained and means that (Y, F) do not cause U in the sense of Sims.4 If we are ready to assume that the joint probability distribution of all the variables of interest is defined by a density function ,, Assumptions 3.1 and 3.2 are summarized by: ,[(Yt , Ft )1≤t≤T |U1T ] =

T 0

,[(Yt , Ft )|U1t ].

(3.5)

t=1

The framework defined by (3.5) is very general for state-space modeling and extends such standard models as parameter driven models described in Cox (1981), stochastic volatility models as well as the state-space time series models (see Harvey (1989)). Our vector Ut of state variables can also be seen as a hidden Markov chain, a popular tool in nonlinear econometrics to model regime switches introduced by Hamilton (1989). The merit of Assumptions 3.1 and 3.2 for asset pricing is to summarize the relevant conditioning information by the set U1t of current and past values of the state variables, ,[(Yt+1 , Ft+1 , Ut+1 )|(Yτ , Fτ )1≤τ ≤t U1t ] = ,[(Yt+1 , Ft+1 , Ut+1 )|U1t ].

(3.6)

In practice, to make (3.6) useful, one would like to limit the relevant past by a homogeneous Markovianity assumption. Assumption 3.3 The conditional probability distribution of (Yt+1 , Ft+1 , Ut+1 ) given U1t coincides, for any t = 1, . . . , T , with the conditional probability distribution given Ut . Moreover, this probability distribution does not depend on t. This assumption implies that the multivariate process Ut is homogeneous Markovian of order one.5 3 See Florens, Mouchart and Rollin (1990) for a systematic study of the concept of conditional independence

and Florens and Mouchart (1982) for its relation with noncausality. 4 This noncausality concept is equivalent to the noncausality notion developed by Granger (1969). Assumption

3.2 can be equivalently replaced by an assumption stating that the state variables U can be optimally forecasted from their own past, with the knowledge of past values of other variables being useless (see Renault (1999)). 5 As usual, since the dimension of the multivariate process U is not limited a priori, the assumption of t Markovianity of order one is not restrictive with respect to higher order Markov processes. For brevity, we will hereafter term Assumption 3.3 the assumption of Markovianity of the process Ut .

´ Renault R. Garcia and E.

160

Given these assumptions, we are allowed to conclude that the pricing function, as characterized by (3.3), will involve the conditioning information only through the current value Ut of the state variables. Indeed, (3.6) can be rewritten: ,[(Yt+1 , Ft+1 , Ut+1 )|(Yτ , Fτ )1≤τ ≤t U1t ] = ,[(Yt+1 , Ft+1 , Ut+1 )|Ut ].

(3.7)

We have seen how the dimension reduction is achieved in the longitudinal direction. To arrive at a similar reduction in the cross-sectional direction, one needs to add an assumption about the dimension of the range of m t+1 , given the state variables Ut . We assume that this range is spanned by K factors, Fkt+1, k = 1, . . . , K given as components of the process Ft+1 . Assumption 3.4 (SDF spanning) m t+1 is a deterministic function of the variables Ut and Ft+1 . This assumption is not as restrictive as it might appear since it can be maintained when there exists an admissible SDF m t+1 with an unsystematic part εt+1 = m t+1 − E[m t+1 |Ft+1 , Ut ] that is uncorrelated, given Ut , with any feasible payoff pt+1 ∈ t+1 = E[m t+1 |Ft+1 , Ut ] is another admissible SDF Pt+1 . Actually, in this case, m t+1 is by since E[m t+1 pt+1 |Ut ] = E[ m t+1 pt+1 |Ut ] for any pt+1 ∈ Pt+1 and m definition conformable to Assumption 3.4. In Section 4 below, we will consider a linear SDF spanning, even if Assumption 3.4 allows for more general factor structures such as log-linear factor models of interest rates in Duffie and Kan (1996) and Dai and Singleton (1999) or nonlinear APT (see Bansal et al., 1993). The linear benchmark is of interest when, for statistical or economic reasons, it appears useful to characterize the SDF as an element of a particular K -dimensional vector space, possibly time-varying through state variables. This is in contrast with nonlinear factor pricing where structural assumptions make a linear representation irrelevant for structural interpretations, even though it would remain mathematically correct.6 The linear case is of course relevant when the asset pricing model is based on a linear factor model for asset returns as in Ross (1976) as we will see in the next section. 4 Affine regression of payoffs on factors with conditioning on state variables The longitudinal reduction of dimension through state variables put forward in Section 3 will be used jointly with the cross-sectional reduction of dimension through factors in the context of a conditional affine regression of payoffs or returns on factors. More precisely, the factor loadings, which are the regression coefficients on factors and which are often called beta coefficients, will be considered from 6 We will see in particular in Section 5 that a log-linear setting appears justified by a natural log-normal model

of returns given state variables.

5. Latent Variable Models for SDFs

161

a conditional viewpoint, where the conditioning information set will be summarized by state variables given (3.7). We will first introduce the conditional beta coefficients and the corresponding conditional beta pricing formulas. We will then revisit the standard asset pricing theory which underpins these conditional beta pricing formulas, namely the arbitrage pricing theory of Ross (1976) stated in a conditional factor analysis setting.

4.1 Conditional beta coefficients We first introduce conditional beta coefficients for payoffs, then for returns. Definition 4.1 The conditional affine regression E L t [Pt+1 |Ft+1 ] of a payoff pt+1 on the vector Ft+1 of factors given the information Jt is defined by: E L t [ pt+1 |Ft+1 ] = β 0t +

K

β kt Fkt+1

(4.1)

k=1

with: εt+1 = pt+1 − E L t [ pt+1 |Ft+1 ] satisfying: E[εt+1 |Jt ] = 0, Cov[ε t+1 , Ft+1 |Jt ] = 0. Similarly, if we denote by rt+1 = pt+1 /π t ( pt+1 ) the return of an asset with a payoff7 pt+1 , we define the conditional affine regression of the return rt+1 on Ft+1 by: K β rkt Fkt+1 . (4.2) E L t [rt+1 |Ft+1 ] = β r0t + k=1

Of course, the beta coefficients of returns can be related to the beta coefficients of payoffs by: β kt β rkt = for k = 0, 1, 2, . . . , K . (4.3) π t ( pt+1 ) Moreover, the characterization of conditional probability distributions in terms of returns instead of payoffs makes more explicit the role of state variables. To see this, let us describe payoffs at time t + 1 from the price at the same date and a dividend process by:8 pt+1 = π t+1 + Dt+1 .

(4.4)

7 Strictly speaking, the return is not defined for states of nature where π ( p t t+1 ) = 0. This may complicate

the statement of characterization of the SDF in terms of expected returns as in the main theorem (Theorem 4.4) of this section. However, this technical difficulty may be solved by considering portfolios which contain a particular asset with nonzero price in any state of nature. This technical condition ensuring the existence of such a payoff with nonzero price has already been mentioned in Section 2 (see also the sufficient condition 4.11 below when there exists a riskless asset). In what follows, the corresponding technicalities will be neglected. 8 As announced in Section 3, we depart from the expositional shortcut where the price included discounted dividends.

´ Renault R. Garcia and E.

162

Following Assumption 3.1, we will assume that the rates of growth of dividends9 are asset-specific variables Yt and serially uncorrelated given state variables. In t other words, Yt = DDt−1 , t = 1, 2, . . . , T , are mutually independent given U1T . Moreover, π t+1 in (4.4) has to be interpreted as the price at time (t + 1) of the same asset with price π t at time t defined from the pricing functional (3.3). In other words, the pricing equation (3.3) can be rewritten: ! Dt+1 ψ t (Jt+1 ) ψ t (Jt ) = E m t+1 + 1 |Jt . (4.5) Dt Dt Dt+1 Given Assumptions 3.1, 3.2 and 3.3, we are allowed to conclude that, under general regularity conditions,10 Equation (4.5) defines a unique time-invariant deterministic function ϕ(·) such that: ! Dt+1 ϕ(Ut ) = E m t+1 (ϕ(Ut+1 ) + 1)|Ut . (4.6) Dt In other words, we get the following decomposition formulas for prices and returns: πt rt+1

= ϕ(Ut )Dt π t+1 + Dt+1 Dt+1 ϕ(Ut+1 ) + 1 . = = πt Dt ϕ(Ut )

(4.7)

A by-product of this decomposition is that, by application of (3.7), the joint conditional probability distribution of future factors and returns (Fτ , rτ )τ >t given Jt depends upon Jt only through Ut in a homogeneous way. In particular, the conditional beta coefficients of returns are fixed deterministic functions of the current value of state variables: β rkt = β rk (Ut )

for k = 0, 1, 2, . . . , K .

(4.8)

4.2 Conditional beta pricing Since the seminal papers of Sharpe (1964) and Lintner(1965) on the unconditional CAPM to the most recent literature on conditional beta pricing (see e.g. Harvey (1991), Ferson and Korajczyk (1995)), beta coefficients with respect to well-chosen factors are put forward as convenient measures of compensated risk which explain the discrepancy between expected returns among a collection of financial assets. In order to document these traditional approaches in the modern setting of SDF, we have to add two fairly innocuous additional assumptions. 9 Stationarity (see Assumption 3.3) requires that we include the growth rates of dividends and not their levels

in the variables Yt . 10 These regularity conditions amount to the possibility of applying a contraction mapping argument to ensure the existence and unicity of a fixed point ϕ(·) of the functional defining the right hand side of (4.6).

5. Latent Variable Models for SDFs

163

Assumption 4.2 If p Ft+1 denotes the orthogonal projection (for the conditional scalar product (2.1)) of the constant vector ι on the space Pt+1 of feasible payoffs, the set Mt+1 of admissible SDF does not contain a variable λt p Ft+1 with λt ∈ Jt . Assumption 4.3 Any admissible SDF has a nonzero conditional expectation given Jt . Without Assumption 4.2, one could write for any pt+1 ∈ Pt+1 : π t ( pt+1 ) = λt E[ p Ft+1 pt+1 |Jt ] = λt E[ pt+1 |Jt ].

(4.9)

Therefore, all the feasible expected returns would coincide with 1/λt . When there is a riskless asset, Assumption 4.2 simply means that an admissible SDF m t+1 should be genuinely stochastic at time t, that is not an element of the available information Jt at time t. Without Assumption 4.3, one could write the price π t ( pt+1 ) as: π t ( pt+1 ) = E[m t+1 pt+1 |Jt ] = Cov[m t+1 pt+1 |Jt ],

(4.10)

which would not depend on the expected payoff E[ pt+1 |Jt ]. When there is a riskless asset, Assumption 4.3 would be implied by a positivity requirement:11 P[ p > 0] = 1 5⇒ P[π t ( p) ≤ 0] = 0.

(4.11)

With these two assumptions, we can state the central theorem of this section, which links linear SDF spanning with linear beta pricing and multibeta models of expected returns. Theorem 4.4 The three following properties are equivalent: P1: Linear Beta Pricing: ∃ m t+1 ∈ Mt+1 , ∀ pt+1 ∈ Pt+1 : π t ( pt+1 ) = β 0t E[m t+1 |Ut ] +

K

β kt E[m t+1 Fkt+1 |Ut ],

(4.12)

k=1

P2: Linear SDF Spanning: ∃ m t+1 ∈ Mt+1 , ∃ λkt ∈ Jt , k = 0, 1, 2, . . . , K : λkt = λk (Ut )

and m t+1 = λ0 (Ut ) +

K

λk (Ut )Fkt+1 ,

(4.13)

k=1

P3: Multibeta Model of Expected Returns: ∃ ν kt ∈ Jt , k = 0, 1, 2, . . . , K , for any feasible return r t+1 : E[rt+1 |Ut ] = ν 0t +

K

ν kt β rk (Ut ).

(4.14)

k=1 11 This positivity requirement implies the continuity of the pricing function π (·) needed for establishing Theot

rem 2.3.

´ Renault R. Garcia and E.

164

Theorem 4.4 can be proved (see Renault, 1999) from three sets of assumptions: assumptions which ensure the existence of admissible SDFs (Section 2), assumptions about the state variables (Section 3), and technical Assumptions 4.2 and 4.3. Three main lessons can be drawn from Theorem 4.4: (i) It makes explicit what we have called a cross-sectional reduction of dimension through factors, generally conceived to ensure SDF spanning, and more precisely linear SDF spanning, which corresponds to the specification (4.13) of the deterministic function referred to in Assumption 3.4. With a linear beta pricing formula, prices π t ( pt+1 ) of a large cross-sectional collection of payoffs pt+1 ∈ Pt+1 can be computed from the prices of K + 1 particular “assets”: π t (ı) = E[m t+1 |Jt ] = E[m t+1 |Ut ]

(4.15)

π t (Fkt+1 ) = E[m t+1 Fkt+1 |Jt ] = E[m t+1 Fkt+1 |Ut ],

k = 1, 2, . . . , K .

If there does not exist a riskless asset or if some factors are not feasible payoffs, one can always interpret suitably normalized factors as returns on particular portfolios called mimicking portfolios. Moreover, since the only property of factors which matters is linear SDF spanning, one may assume without loss of generality that Var[Ft+1 |Ut ] is nonsingular to avoid redundant factors. The beta coefficients are then computed directly by:12 [β 1t , β 2t , . . . , β kt ] = Cov[ pt+1 , Ft+1 |Jt ] Var[Ft+1 |Ut ]−1 K β 0t = E[ pt+1 |Jt ] − β kt E[Ft+1 |Ut ]

(4.16)

k=1

to deduce the price: π t ( pt+1 ) = β 0t π t (ı) +

K

β kt π t (Fkt+1 ).

(4.17)

k=1

The cross-sectional reduction of dimension consists of computing only K + 1 factor prices (π t (ı), π t (Fkt+1 )) to price any payoff. The longitudinal reduction of dimension is also exploited since the pricing formula for these factors (4.15) depends on the conditioning information Jt only through Ut . 12 When the payoffs include dividends, the only relevant conditioning information is characterized by state

variables: pt+1 , Ft+1 |Ut Dt ! pt+1 |Ut . Dt

Cov[ pt+1 , Ft+1 |Jt ]

=

Dt Cov

E[ pt+1 |J t]

=

Dt E

!

5. Latent Variable Models for SDFs

165

(ii) Even though the linear beta pricing formula P1 is mathematically equivalent to the linear SDF spanning property P2, it is interesting to characterize it by a property of the set of feasible returns under the maintained Assumption 2.4 of SDF spanning. More precisely, since this assumption allows us to write: π t ( pt+1 ) = E[m t+1 E[ pt+1 |Ft+1 , Jt ]|Jt ],

(4.18)

P1 is obtained as soon as a linear factor model of payoffs or returns is assumed (see e.g. Engle, Ng and Rothschild (1990)13 ). It means that the conditional expectation of payoffs given factors and Jt coincide with the conditional affine regression (given Jt ) of these payoffs on these factors: E[ pt+1 |Ft+1 , Jt ] = E L t [ pt+1 |Ft+1 ] = β 0t +

K

β kt Fkt+1 .

(4.19)

k=1

Such a linear factor model can for instance be deduced from an assumption of joint conditional normality of returns and factors. This is the case when factors are themselves returns on some mimicking portfolios and returns are jointly conditionally gaussian. The standard CAPM illustrates the linear structure that is obtained from such a joint normality assumption for returns. However, the main implication of linear beta pricing is the zero-price property of idiosyncratic risk (ε t+1 in the notation of Definition 4.1) since only the systematic part of the payoff pt+1 is compensated:14 π t ( pt+1 ) = π t (E L t ( pt+1 |Ft+1 )),

(4.20)

that is: π t (εt+1 ) = 0. As we will see in more details in Subsection 4.3 below, this zero-price property for the idiosyncratic risk lays the basis for the APT model developed by Ross (1976). Moreover, if a factor is not compensated because E[m t+1 Fkt+1 |Ut ] = 0, it can be forgotten in the beta pricing formula. In other words, irrespective of the statistical procedure used to build the factors, only the compensated factors have to be kept: kt = E[m t+1 Fkt+1 |Ut ] = 0,

for k = 1, . . . , K .

(4.21)

(iii) The minimal list of factors that have to be kept may also be characterized by the spanning interpretation P2. In this respect, the number of factors is purely a matter of convention: how many factors do we want to introduce to span the one-dimensional space where the SDF evolves? The existence of the SDF proves that a one-factor model with the SDF itself as 13 However, these authors maintain simultaneously the two assumptions of linear SDF spanning and linear factor

model of returns. These two assumptions are clearly redundant as explained above. 14 The prices of the systematic and idiosyncratic parts are defined, by abuse of notation, by their conditional

scalar product with the SDF m t+1 .

166

´ Renault R. Garcia and E.

the sole factor is always correct. The definition of K factors becomes an issue for reasons such as economic interpretation, statistical procedures or financial strategies. Moreover, this definition can be changed as long as it keeps invariant the corresponding spanned vectorial space. For instance, one may assume that, conditionally to Jt , the factors are mutually uncorrelated, that is V [Ft+1 |Jt ] is a nonsingular diagonal matrix. One may also rescale the factors to obtain unit variance factors (statistical motivation) or unit cost factors (financial motivation). Let us focus on the latter by assuming that: kt = E[m t+1 Fkt+1 |Ut ] = 1,

for k = 1, . . . , K .

(4.22)

By (4.21), the factor Fkt+1 can be replaced by its scaled value Fkt+1 /kt to get (4.22) without loss of generality. Each factor can then be interpreted as a return on a portfolio (a payoff of unit price) even though we do not assume that there exists a feasible mimicking portfolio (Fkt+1 ∈ Pt+1 ). This normalization rule allows us to prove that the coefficients in the multibeta model of expected returns (P3) are given by: ν kt = E[Fkt+1 |Ut ] − ν 0t

for k = 1, . . . , K .

(4.23)

Since, on the other hand, it is easy to check that: ν 0t =

1 E[m t+1 |Ut ]

(4.24)

coincides with the risk-free return when there exists a risk-free asset, the multibeta model (P3) of expected returns can be rewritten in the more standard form: K β rk (Ut )[E[Fkt+1 |Ut ] − ν 0t ], (4.25) E[rt+1 |Ut ] − ν 0t = k=1

which gives the risk premium of the asset as a linear combination of the risk premia of the various factors, with weights defined by the beta coefficients viewed as risk quantities. Moreover, (4.25) is very useful for statistical inference in factor models (see in particular Subsection 4.3) since it means that the beta pricing formula is characterized by the nullity of the intercept term in the conditional regression of net returns on net factors, given Ut .

4.3 Conditional factor analysis Factor analysis with a cross-sectional point of view has been popularized by Ross (1976) to provide some foundations to multibeta models of expected returns. The basic idea is to start, for a countable sequence of assets i = 1, 2, . . . with the

5. Latent Variable Models for SDFs

167

decomposition of their payoffs or returns into systematic and idiosyncratic parts with respect to K variables Fkt+1 , 1, 2, . . . , K , considered as candidate factors: rit+1 = β ri0 (Ut ) +

K

β rik (Ut )Fkt+1 + εit+1

k=1

E[εit+1 |Ut ] = 0 Cov[Fkt+1 , ε it+1 |Ut ] = 0 ∀k = 1, 2, . . . , K , for i = 1, 2, . . .

(4.26)

Since, as already explained, the multibeta model (P3) of expected returns amounts to assume that idiosyncratic risks are not compensated, that is: E[m t+1 εit+1 |Ut ] = 0

for i = 1, 2, . . . ,

(4.27)

a natural way to look for foundations of this pricing model is to ask why idiosyncratic risk should not be compensated. Ross (1976) provides the following explanation. For a portfolio in the n assets defined by shares θ in , i = 1, 2, . . . , n of wealth invested: n θ in=1 , (4.28) i=1

the unsystematic risk is measured by: Var

n

! n 2 2 θ in εit+1 |Ut = θ in σ i (Ut ),

i=1

(4.29)

i=1

if we assume that the individual idiosyncratic risks are mutually uncorrelated: Cov[εit+1 ε jt+1 |Ut ] = 0 if i = j,

(4.30)

and we denote the asset idiosyncratic conditional variances by: σ i2 (Ut ) = Var[ε it+1 |Ut ]. Therefore, if it is possible to find a sequence (θ in )1≤i≤n, n = 1, 2, . . . conformable to (4.28) and (4.31) below: P lim

n=∞

n

2 2 θ in σ i (Ut ) = 0,

(4.31)

i=1

the idiosyncratic risk can be diversified and should not be compensated by a simple no-arbitrage argument. Typically, this result will be valid with bounded conditional variances and equally-weighted portfolios (θ in = 1/n for i = 1, 2, . . .). In other words, according to Ross (1976), factors have as a basic property to define idiosyncratic risks which are mutually uncorrelated. This justifies beta pricing

´ Renault R. Garcia and E.

168

with respect to them and provides the following decomposition of the conditional covariance matrix of returns: t = β t φ t β t + Dt

(4.32)

where t , β t , φ t , Dt are matrices of respective sizes n × n, n × k, k × k and n × n defined by: t = Cov(rit+1 , r jt+1 |Ut ) 1≤i≤n,1≤ j≤n β t = β rik (Ut ) 1≤i≤n,1≤k≤K φt Dt

= (Cov(Fkt+1 , Flt+1 |Ut ))1≤k≤K ,1≤l≤K = Cov(εit+1 , ε jt+1 |Ut ) 1≤i≤n,1≤ j≤n

(4.33)

with the maintained assumption that Dt is a diagonal matrix. In the particular case where returns and factors are jointly conditionally gaussian given Ut , the returns are mutually independent knowing the factors in the conditional probability distribution given Ut . We have therefore specified a Factor Analysis model in a conditional setting. Moreover, if one adopts in such a setting some well-known results in the Factor Analysis methodology, one can claim that the model is fully defined by the decomposition (4.32) of the covariance matrix of returns with the diagonality assumption15 about the idiosyncratic variance matrix Dt . In particular, this decomposition defines by itself the set of K -dimensional variables Ft+1 conformable to it with the interpretation (4.33) of the matrices: Ft+1 = E[Ft+1 |Ut ] + φ t β t t−1 (rt+1 − E[rt+1 |Ut ]) + z t+1 ,

(4.34)

where rt+1 = (rit+1 )1≤i≤n and z t+1 is a K -dimensional variable assumed to be independent of rt+1 given Jt and such that: E[z t+1 |Jt ] = 0 Var[z t+1 |Jt ] = φ t − φ t β t t−1 β t φ t .

(4.35)

It means that, up to an independent noise z t (which represents factor indeterminacy), the factors are rebuilt by the so-called “Thompson Factor scores”: t,t+1 = E[Ft+1 |Ut ] + φ t β t t−1 (rt+1 − E(rt+1 |Ut )), F

(4.36)

t,t+1 = E[Ft+1 |Ut , rt+1 ] in the which correspond to the conditional expectation: F particular case where returns and factors are jointly gaussian given Ut . To summarize, according to Ross (1976) adapted in a conditional setting with latent variables, the question of specifying a multibeta model of expected returns 15 Chamberlain and Rothschild (1983) have proposed to take advantage of the sequence model (n → ∞) to

weaken the diagonality assumption on Dt by defining an approximate factor structure. We consider here a factor structure for fixed n.

5. Latent Variable Models for SDFs

169

can be addressed in two steps. In a first step, one should identify a factor structure for the family of returns: t

= β t φ t β t + Dt , Dt diagonal.

(4.37)

In a second step, the issue of a multibeta model for expected returns is addressed:16 E[rt+1 |Ut ] = β t E[Ft+1 |Ut ].

(4.38)

Due to the difficulty of disentangling the dynamics of the beta coefficients in β t from the one of the factors, both at first order E[Ft+1 |Ut ] in (4.38) and at second order φ t = Var[Ft+1 |Ut ] in (4.37), a common solution in the literature is to add the quite restrictive assumption that the matrix β t of conditional factor loadings is deterministic and time invariant: β t = β for every t.

(4.39)

It should be noticed that assumption (4.39) does not imply per se that conditional betas coincide with unconditional ones since unconditional betas are not unconditional expectations of conditional ones. However, since by (4.39): r t+1 = E(rt+1 |Ut ) − β E(Ft+1 |Ut ) + β Ft+1 + ε t+1 ,

(4.40)

it can be seen that β will coincide with the matrix of unconditional betas if and only if: Cov[E(rt+1 |Ut ) − β E(Ft+1 |Ut ), Ft+1 |Ut ] = 0.

(4.41)

In particular, if the conditional multibeta model (4.38) of expected returns and the assumption (4.39) of constant conditional betas are maintained simultaneously, the unconditional multibeta model of expected returns can be deduced: Ert+1 = β E Ft+1 .

(4.42)

Moreover, this joint assumption guarantees that the conditional factor analytic model (4.40) can be identified by a standard procedure of static factor analysis since: Var(εt+1 ) = E(Var(ε t+1 |Ut )) = E(Dt )

(4.43)

will be a diagonal matrix as Dt . This remark has been fully exploited by King, Sentana and Wadhwani (1994). However, a general inference methodology for the 16 According to the comments following Theorem 4.4, we assume that factors are suitably scaled in order to get

the convenient interpretation for the coefficients of the multibeta model of expected returns. Such a scaling can be done without loss of generality since it does not modify the property (4.37). Moreover, in (4.38), returns and factors are implicitly considered in excess of the risk-free rate (net returns and factors).

170

´ Renault R. Garcia and E.

conditional factor analytic model remains to be stated. First, the restrictive assumption of fixed conditional betas should be relaxed. Second, even with fixed betas, one would like to be able to identify the conditional factor analytic model (4.40) without maintaining the joint hypothesis (4.38) of a multibeta model of expected returns. In this latter case, a factor stochastic volatility approach (see e.g. Meddahi and Renault (1996) and Pitt and Shephard (1999)) should be well-suited. The narrow link between our general state variable setting and the nowadays widespread stochastic volatility model is discussed in the next section. 5 A dynamic asset pricing model with latent variables In the last section, we analyzed the cross-sectional restrictions imposed by financial asset pricing theories in the context of factor models. While these factor models were conditioned on an information set, the emphasis was not put on the dynamic behavior of asset returns. In this section, we propose an intertemporal asset pricing model based on a conditioning on state variables. Using assumptions spelled out in Section 3, we will accommodate a rich intertemporal framework where the stochastic discount factor can represent nonseparable preferences such as recursive utility.17 5.1 An equilibrium asset pricing model with recursive utility Many identical infinitely lived agents maximize their lifetime utility and receive each period an endowment of a single nonstorable good. We specify a recursive utility function of the form: Vt = W (Ct , µt ),

(5.1)

where W is an aggregator function that combines current consumption C t with t+1 | Jt ), a certainty equivalent of random future utility V t+1 , given the µt = µ(V information available to the agents at time t, to obtain the current-period lifetime utility Vt . Following Kreps and Porteus (1978), Epstein and Zin (1989) propose the CES function as the aggregator function, i.e. Vt = [C tρ + βµρt ] ρ . 1

(5.2)

The way the agents form the certainty equivalent of random future utility is based α |It ], on their risk preferences, which are assumed to be isoelastic, i.e. µαt = E[V t+1 17 In the proposed intertemporal asset pricing model, we will specify the stochastic discount factor in an

equilibrium setting. We will therefore make our stochastic assumptions on economic fundamentals such as consumption and dividend growth rates. In Garcia, Luger and Renault (1999), we make the same types of assumptions directly on the pair SDF-stock returns without reference to an equilibrium model. Similar asset pricing formulas and implications of the presence of leverage effects are obtained in this less specific framework.

5. Latent Variable Models for SDFs

171

where α ≤ 1 is the risk aversion parameter (1 − α is the Arrow–Pratt measure of relative risk aversion). Given these preferences, the following Euler condition must be valid for any asset j if an agent maximizes his lifetime utility (see Epstein and Zin (1989)): γ (ρ−1) ! γ −1 γ C t+1 Mt+1 R j,t+1 |Jt = 1, E β (5.3) Ct where Mt+1 represents the return on the market portfolio, R j,t+1 the return on any asset j, and γ = ρα . The stochastic discount factor is therefore given by: γ (ρ−1) γ −1 γ C t+1 m t+1 = β Mt+1 . (5.4) Ct The parameter ρ is associated with intertemporal substitution, since the elasticity of intertemporal substitution is 1/(1 − ρ). The position of α with respect to ρ determines whether the agent has a preference towards early resolution of uncertainty (α < ρ) or late resolution of uncertainty (α > ρ).18 Since the market portfolio price, say PtM at time t, is determined in equilibrium, it should also verify the first-order condition: γ (ρ−1) ! γ γ C t+1 Mt+1 |Jt = 1. (5.5) E β Ct In this model, the payoff of the market portfolio at time t is the total endowment of the economy Ct . Therefore the return on the market portfolio Mt+1 can be written as follows: P M + Ct+1 Mt+1 = t+1 M . Pt Replacing Mt+1 by this expression, we obtain: ! Ct+1 γ ρ γ γ γ (λt+1 + 1) |Jt , λt = E β Ct

(5.6)

where: λt = PtM /C t . The pricing of assets with price St which pay dividends Dt such as stocks will lead us to characterize the joint probability distribution of the stochastic process (X t , Yt , Jt ) where: X t = log(Ct /C t−1 ) and Yt = log(Dt /Dt−1 ). As announced in Section 3, we define this dynamics through a stationary vectorprocess of state variables Ut so that: Jt = ∨τ ≤t [X τ , Yτ , Uτ ].

(5.7)

18 As mentioned in Epstein and Zin (1991), the association of risk aversion with α and intertemporal sustitution

with ρ is not fully clear, since at a given level α of risk aversion, changing ρ affects not only the elasticity of intertemporal sustitution but also determines whether the agent will prefer early or late resolution of uncertainty.

´ Renault R. Garcia and E.

172

Given this model structure (with log(C t /Ct−1 ) serving as a factor Ft ), we can restate Assumptions 3.1 and 3.2 as: Assumption 5.1 The pairs (X t , Yt )1≤t≤T , t = 1, . . . , T are mutually independent knowing U1T = (Ut )1≤t≤T . Assumption 5.2 The conditional probability distribution of (X t, Yt ) given U1T = (Ut )1≤t≤T coincides, for any t = 1, . . . , T , with the conditional probability distribution given U1t = (Uτ )1≤τ ≤t . As mentioned in Section 3, Assumptions 5.1 and 5.2 together with Assumption 3.3 and the Markovianity of state variables Ut allow us to characterize the joint probability distribution of the (X t , Yt ) pairs, t = 1, . . . , T , given U1T , by: ,[(X t , Yt )1≤t≤T |U1T ]

=

T 0

,[X t , Yt |Ut ].

(5.8)

t=1

Proposition 5.3 below provides the exact relationship between the state variables and equilibrium prices. Proposition 5.3 Under Assumptions 5.1 and 5.2 we have: PtM = λ(Ut )Ct,

St = ϕ(Ut )Dt ,

where λ(Ut ) and ϕ(Ut ) are respectively defined by: ! Ct+1 γ ρ γ γ γ λ(Ut ) = E β (λ(Ut+1 ) + 1) |Ut , Ct and

ϕ(Ut ) = E β γ

Ct+1 Ct

γ ρ−1

λ(Ut+1 ) + 1 λ(Ut )

γ −1

Dt+1 |Ut . (ϕ(Ut+1 ) + 1) Dt

Therefore, the functions λ(·), ϕ(·) are defined on R P if there are P state variables. Moreover, the stationarity property of the U process together with assumptions 5.1, 5.2 and a suitable specification of the density function (3.6) allow us to make the process (X, Y ) stationary by a judicious choice of the initial distribution of (X, Y ). In this setting, a contraction mapping argument may be applied as in Lucas (1978) to characterize the functions λ(·) and ϕ(·) according to Proposition 5.3. It should be stressed that this framework is more general than the Lucas one because the state variables Ut are given by a general multivariate Markovian process (while a Markovian dividend process is the only state variable in Lucas

5. Latent Variable Models for SDFs

173

(1978)). Using the return definition for the market portfolio and asset St , we can write: log Mt+1 = log

λ(Ut+1 ) + 1 + X t+1 , and λ(Ut )

log Rt+1 = log

(5.9)

ϕ(Ut+1 ) + 1 + Yt+1 . ϕ(Ut )

Hence, the return processes (Mt+1 , Rt+1 ) are stationary as U, X and Y , but, contrary to the stochastic setting in the Lucas (1978) economy, are not Markovian due to the presence of unobservable state variables U . Given this intertemporal model with latent variables, we will show how standard asset pricing models will appear as particular cases under some specific configurations of the stochastic framework. In particular, we will analyze the pricing of bonds, stocks and options and show under which conditions the usual models such as the CAPM or the Black–Scholes model are obtained.

5.2 Revisiting asset pricing theories for bonds, stocks and options through the leverage effect In this section, we introduce an additional assumption on the probability distribution of the fundamentals X and Y given the state variables U . Assumption 5.4

X t+1 Yt+1

|Utt+1

∼ℵ

m X t+1 m Y t+1

,

σ 2X t+1 σ X Y t+1 σ X Y t+1 σ 2Y t+1

!! ,

where m X t+1 = m X (U1t+1 ), m Y t+1 = m Y (U1t+1 ), σ 2X t+1 = σ 2X (U1t+1 ), σ X Y t+1 = σ X Y (U1t+1 ), σ 2Y t+1 = σ 2X (U1t+1 ). In other words, these mean and variance covariance functions are time-invariant and measurable functions with respect to Utt+1 , which includes both Ut and Ut+1 . This conditional normality assumption allows for skewness and excess kurtosis in unconditional returns. It is also useful for recovering as a particular case the Black–Scholes formula.19 19 It can also be argued that, if one considers that the discrete-time interval is somewhat arbitrary and can be

infinitely split, log-normality (conditional on state variables U ) is obtained as a consequence of a standard central limit argument given the independence between consecutive (X, Y ) given U .

´ Renault R. Garcia and E.

174

5.2.1 The pricing of bonds The price of a bond delivering one unit of the good at time T , B(t, T ), is given by the following formula: B(t, T )], (5.10) B(t, T ) = E t [ where: B(t, T ) = β γ (T −t) atT (γ ) exp((α − 1)

T −1 τ =t

T −1 1 m X τ +1 + (α − 1)2 σ 2 X τ +1 ), 2 τ =t

γ −1

1 −1 1+λ(U1τ +1 ) . with: atT (γ ) = τT=t λ(U1τ ) This formula shows how the interest rate risk is compensated in equilibrium, and in particular how the term premium is related to preference parameters. To be more explicit about the relationship between the term premium and the preference parameters, let us first notice that we have a natural factorization: B(t, T ) =

T0 −1

B(τ , τ + 1).

(5.11)

τ =t

Therefore, while the discount parameter β affects the level of the B, the two other parameters α and γ affect the term premium (with respect to the return-to-maturity expectations hypothesis, Cox, Ingersoll, and Ross (1981)) through the ratio: 1 −1 B(τ , τ + 1)) E t ( τT=t B(t, T ) = . 1T −1 1T −1 E t τ =t B(τ , τ + 1) E t τ =t E τ B(τ , τ + 1) To better understand this term premium from an economic point of view, let us compare implicit forward rates and expected spot rates at only one intermediary period between t and T : Covt [ B(t, τ ) B(τ , T ) B(t, τ ), B(τ , T )] Et B(t, T ) B(τ , T ) + = = Et . (5.12) B(t, τ ) E t B(t, τ ) E t B(t, τ ) Up to Jensen inequality, Equation (5.12) proves that a positive term premium is brought about by a negative covariation between present and future B. Given the expression for B(t, T ) above, it can be seen that for von-Neuman preferences (γ = 1) the term premium is proportional to the square of the coefficient of relative risk aversion (up to a conditional stochastic volatility effect). Another important observation is that even without any risk aversion (α = 1), preferences still affect the term premium through the nonindifference to the timing of uncertainty resolution (γ = 1). There is however an important sub-case where the term premium will be preference-free because the stochastic discount factor B(t, T ) coincides with the

5. Latent Variable Models for SDFs

175

observed rolling-over discount factor (the product of short-term future bond prices, B(τ , τ +1), τ = t, . . . , T −1). Taking Equation (5.11) into account, this will occur as soon as B(τ , τ + 1) = B(τ , τ + 1), that is when B(τ , τ + 1) is known at time τ . From the expression of B(t, T ) above, it is easy to see that this last property stands if and only if the mean and variance parameters m X τ +1 and σ X τ +1 depend on Uττ +1 only through Uτ . This allows us to highlight the so-called “leverage effect” which appears when the probability distribution of (X t+1 ) given Utt+1 depends (through the functions m X , σ 2X ) on the contemporaneous value Ut+1 of the state process. Otherwise, the noncausality Assumption 5.2 can be reinforced by assuming no instantaneous causality from X to U . In this case, ,(X t |U1T ) = ,(X t |U1t−1 ); it is this property which ensures that short-term stochastic discount factors are predetermined, so the bond pricing formula becomes preference-free: B(t, T ) = E t

T0 −1

B(τ , τ + 1).

τ =t

Of course this does not necessarily cancel the term premiums but it makes them preference-free in the sense that the role of preference parameters is fully hidden in short-term bond prices. Moreover, when there is no interest rate risk because the consumption growth rates X t are serially independent, it is straightforward to check that constant m X t+1 and σ 2X t+1 imply constant λ(·) and in turn B(t, T ) = B(t, T ), with zero term premiums. 5.2.2 The pricing of stocks The stock price formula is given by: γ −1 α−1 t+1 C ) 1 + λ(U t+1 1 (St+1 + Dt+1 ). St = E t β γ Ct λ(U1t ) By a recursive argument, this Euler condition can be rewritten as follows: α−1 D C T T E t β γ (T −t) atT (γ )btT = 1, (5.13) Ct Dt 1 −1 with: btT = τT=t (1 + ϕ(U1τ +1 ))/ϕ(U1τ ). Under conditional log-normality Assumption 5.4, we obtain: T T T 1 B(t, T )btT exp mY τ + σ 2 Y τ + (α − 1) σ XY τ = 1. Et 2 τ =t+1 τ =t+1 τ =t+1 (5.14)

176

´ Renault R. Garcia and E.

With the definitional equation: ! T T 1 ST T ϕ(U1T ) 2 exp |U = mY τ + σ Yτ , E St 1 ϕ(U1t ) 2 τ =t+1 τ =t+1

(5.15)

a useful way of writing the stock pricing formula is: E t [Q X Y (t, T )] = 1,

(5.16)

where:

! T t ST T T ϕ(U1 ) Q X Y (t, T ) = B(t, T )bt exp (α − 1) σ XY τ E |U . St 1 ϕ(U1T ) τ =t+1

(5.17)

To understand the role of the factor Q X Y (t, T ), it is useful to notice that it can be factorized: T0 −1 Q X Y (t, T ) = Q X Y (τ , τ + 1), τ =t

and that there is an important particular case where Q X Y (τ , τ +1) is known at time τ and therefore equal to one by (5.16). This is when there is no leverage effect in the sense that ,(X t , Yt |U1T ) = ,(X t , Yt |U1t−1 ). This means that not only there is no leverage effect neither for X nor for Y , but also that the instantaneous covariance σ X Y t itself does not depend on Ut . In this case, we have Q X Y (t, T ) = 1. Since we also have B(τ , τ + 1) = B(τ , τ + 1), we can express the conditional expected stock return as: ! T ST T 1 1 ϕ(U1T ) E |U = 1T −1 σ XY τ . exp (1 − α) St 1 b T ϕ(U1t ) τ =t+1 τ =t B(τ , τ + 1) t For pricing over one period (t to t+1), this formula provides the agent’s expectation of the next period return (since in this case the only relevant information is U1t ): St+1 1 + ϕ(U1t+1 ) t 1 E exp[(1 − α)σ X Y t+1 ], |U1 = t+1 St B(t, t + 1) ϕ(U1 ) that is:

! 1 St+1 + Dt+1 t |U1 = E exp[(1 − α)σ X Y t+1 ], St B(t, t + 1)

(5.18)

This is a particularly striking result since it is very close to a standard conditional CAPM equation, which remains true for any value of the preference parameters α and ρ. While Epstein and Zin (1991) emphasize that the CAPM obtains for α = 0 (logarithmic utility) or ρ = 1 (infinite elasticity of intertemporal substitution), we stress here that the relation is obtained under a particular stochastic setting for any

5. Latent Variable Models for SDFs

177

values of α and ρ. Remarkably, the stochastic setting without leverage effect which produces this CAPM relationship will also produce most standard option pricing models (for example Black and Scholes (1973) and Hull and White (1987)), which are of course preference-free.20 5.2.3 A generalized option pricing formula The Euler condition for the price of a European option is given by: γ −1 α−1 T0 −1 τ +1 1 + λ(U1 ) CT π t = E t β γ (T −t) Max[0, ST − K ]. (5.19) τ Ct λ(U ) 1 τ =t It is worth noting that the option pricing formula (5.19) is path-dependent with respect to the state variables; it depends not only on the initial and terminal values of the process Ut but also on its intermediate values.21 Indeed, it is not so surprising that when preferences are not time-separable (γ = 1), the option price may depend on the whole past of the state variables. Using Assumptions 5.2, 5.2 and 5.4, we arrive at an extended Black–Scholes formula: " 6 K B(t, T ) πt = E t Q ∗X Y (t, T )"(d1 ) − "(d2 ) , (5.20) St St where:

∗ S Q X Y (t,T ) 1/2 T log tK 1 B(t,T ) 2 + σ Yτ , d1 = T ( τ =t+1 σ 2Y τ )1/2 2 τ =t+1 d2 = d1 −

T τ =t+1

Q ∗X Y (t, T ) =

1/2 σ 2Y τ

, and

Q X Y (t, T ) ϕ(U1T ) . ϕ(U1t ) btT

(5.21)

To put this general formula in perspective, we will compare it to the three main approaches that have been used for pricing options: equilibrium option pricing, arbitrage-based option pricing, and GARCH option pricing. The latter pricing model can be set either in an equilibrium framework or in an arbitrage framework. Concerning the equilibrium approach, our setting is more general than 20 A similar parallel is drawn in an unconditional two-period framework in Breeden and Litzenberger (1978). 21 Since we assume that the state variable process is Markovian, λ(U T ) does not depend on the whole path of 1 state variables but only on the last values UT .

´ Renault R. Garcia and E.

178

the usual expected utility framework since it accommodates non-separable preferences. The stochastic framework with latent variables could also accommodate state-dependent preferences such as habit formation based on state variables. Of course, the most popular option pricing formulas among practitioners are based on arbitrage rather than on equilibrium in order to avoid in particular the specification of preferences. From the start, it should be stressed that our general formula (5.20) nests a large number of preference-free extensions of the Black– 1 −1 B(t, T ) = τT=t B(τ , τ + 1), Scholes formula. In particular if Q X Y (t, T ) = 1 and one can see that the option price (5.20) is nothing but the conditional expectation of the Black–Scholes price,22 where the expectation is computed with respect to the joint probability distribution of the rolling-over / interest rate r t,T = T T −1 2 − τ =t log B(τ , τ + 1) and the cumulated volatility σ t,T = τ =t+1 σ Y τ . This framework nests three well-known models. First, the most basic ones, the Black and Scholes (1973) and Merton (1973) formulas, when interest rates and volatility are deterministic. Second, the Hulland White (1987) stochastic volatility 2 extension, since σ t,T = Var log SSTt |U1T corresponds to the cumulated volatil T ity t σ 2u du in the Hull and White continuous-time setting.23 Third, the formula allows for stochastic interest rates as in Turnbull and Milne (1991) and Amin and Jarrow (1992). However, the usefulness of our general formula (5.20) comes above all from the fact that it offers an explicit characterization of instances where the preference-free paradigm cannot be maintained. Usually, preference-free option pricing is underpinned by the absence of arbitrage in a complete market setting. However, our equilibrium-based option pricing does not preclude incompleteness and points out in which cases this incompleteness will invalidate the preferencefree paradigm. The only cases of incompleteness which matter in this respect occur precisely when at least one of the two following conditions: Q X Y (t, T ) = 1

(5.25)

22 We refer here to a BS option pricing formula where dividend flows arrive during the lifetime of the option

and are accounted for in the definition of the risk neutral probability, while the option payoff does not include dividends. In other words, the BS option price is given by: π tB S

=

e−r (T −t) E t [Max(0, ST − K )]

(5.22)

=

e−δ(T −t) St "(d1 ) − K e−r (T −t) "(d2 ),

(5.23)

since in the risk neutral world: S log T N ((r − δ)(T − t), σ 2 (T − t)), St where δ is the intensity of the dividend flow.

(5.24)

23 See Subsection 5.3 for a detailed comparison between standard stochastic volatility models and our state

variable framework.

5. Latent Variable Models for SDFs

B(t, T ) =

T0 −1

B(τ , τ + 1)

179

(5.26)

τ =t

is not fulfilled. In general, preference parameters appear explicitly in the option pricing formula through B(t, T ) and Q X Y (t, T ). However, in so-called preference-free formulas, it happens that these parameters are eliminated from the option pricing formula through the observation of the bond price and the stock price. In other words, even in an equilibrium framework with incomplete markets, option pricing is preference-free if and only if there is no leverage effect in the general sense that B(t, t + 1) are predetermined. This result generalizes Amin and Q X Y (t, t + 1) and Ng (1993), who called this effect predictability. It is worth noting that our results of equivalence between preference-free option pricing and no instantaneous causality between state variables and asset returns are consistent with another strand of the option pricing literature, namely GARCH option pricing. Duan (1995) derived it first in an equilibrium framework, but Kallsen and Taqqu (1998) have shown that it could be obtained with an arbitrage argument. Their idea is to complete the markets by inserting the discrete-time model into a continuous-time one, where conditional variance is constant between two integer dates. They show that such a continuous-time embedding makes possible arbitrage pricing which is per se preference-free. It is then clear that preference-free option pricing is incompatible with the presence of an instantaneous causality effect, since it is such an effect that prevents the embedding used by Kallsen and Taqqu (1998).

5.3 A comparison with stochastic volatility models The typical stochastic volatility model (SV model hereafter) introduces a positive stochastic process such that its squared value h t represents the conditional variance of the value at time (t + 1) of a second-order stationary process of interest, given a conditioning information set Jt . In our setting, it is natural to define the conditioning information set Jt by (5.8). It means that the information available at time t is not summarized in general by the observation of past and current values of asset prices, since it also encompasses additional information through state variables Ut . Such a definition is consistent with the modern definition of SV processes (see Ghysels, Harvey and Renault, 1996, for a survey). It incorporates unobserved components that might capture well-documented evidence about conditional leptokurtosis and leverage effects of asset returns (given past and current returns). Moreover, such unobserved components are included in the relevant conditioning information set for option pricing models as in Hull and White (1987). The focus of interest in this subsection are the time series properties of asset returns implied

´ Renault R. Garcia and E.

180

by the dynamic asset pricing model presented in Section 5.1. These time series of returns can be seen as stochastic volatility processes by Assumption 5.4 on the conditional probability distribution of the fundamentals (X t+1 , Yt+1 ) given Jt . We focus on (X t+1 , Yt+1 ) instead of asset returns since, by (5.9), the joint conditional probability distribution (given U1t+1 ) of returns for the two primitive assets is defined by Assumption 5.4 up to a shift in the mean. Let us first consider the univariate dynamics in terms of the innovation process ηYt+1 of Yt+1 with respect to Jt defined as: ηYt+1 = Yt+1 − E[m Y (U1t+1 )|U1t ].

(5.27)

The associated volatility and kurtosis dynamics are then characterized by: h tY

= Var[ηYt+1 |U1t ] = Var[m Y (U1t+1 )|U1t ] + E[σ 2Y (U1t+1 )|U1t ]

(5.28)

and Y µ4t

= E[η4Yt+1 |U1t ] = 3E[σ 4Y (U1t+1 )|U1t ] = 3[Var[σ 2Y (U1t+1 )|U1t ] + (E[σ 2Y (U1t+1 )|U1t ])2 ].

(5.29)

As far as kurtosis is concerned, Equations (5.28) and (5.29) provide a representation of the fat-tail effect and its dynamics, sometimes termed the heterokurtosis effect. This extends the representation of the standard mixture model, first introduced by Clark (1973) and extended by Gallant, Hsieh and Tauchen (1991). Indeed, in the particular case where: Var[m Y (U1t+1 )|U1t ] = 0,

(5.30)

we get the following expression24 for the conditional kurtosis coefficient: Y µ4t = 3[1 + (ctY )2 ] (h tY )2

(5.31)

with: 1

ctY =

(Var[σ 2Y (U1t+1 )|U1t ]) 2 E[σ 2Y (U1t+1 )|U1t ]

.

(5.32)

This expression emphasizes that the conditional normality assumption does not preclude conditional leptokurtosis with respect to a smaller set of conditioning information. It should be emphasized that formula (5.31) allows for even more 24 It corresponds to the formula given by Gallant, Hsieh and Tauchen (1991) on page 204.

5. Latent Variable Models for SDFs

181

leptokurtosis than the standard formula since the probability distributions considered are still conditioned on a large information set, including possibly unobserved components. An additional projection on the reduced information set defined by past and current values of observed asset returns will increase the kurtosis coefficient. In other words, our model allows for innovation terms in asset returns that, even standardized by a genuine stochastic volatility (including a mixture effect), are still leptokurtic. Moreover, condition (5.30) is likely not to hold, providing an additional degree of freedom in our representation of kurtosis dynamics. If we consider the stock return itself instead of the dividend growth, the violation of (5.30) is even more likely since m Y (U1t+1 ) is to be replaced by the “expected” return m Y (U1t+1 ) + log(ϕ(U1t+1 ) + 1/ϕ(U1t )). Condition (5.30) will be violated when this expected return differs from its expected value computed by investors according to our equilibrium asset pricing model, that is E[m Y (U1t+1 ) + log(ϕ(U1t+1 ) + 1/ϕ(U1t ))|U1t ]. We will show now that it is precisely this difference which can produce a genuine leverage effect in stock returns, as defined by Black (1976) and Nelson (1991) for conditionally heteroskedastic returns.25 This justifies a posteriori the use of the expression leverage effect in Section 5.2 to account for the fact that the probability distribution of (X t+1 , Yt+1 ) given U1t+1 depends (through the functions m X , m Y , σ X , σ Y and σ X Y ) on the contemporaneous value Ut+1 of the state process.26 According to the standard terminology, the stochastic volatility dividend process exhibits a leverage effect if and only if: Y Y Cov[ηYt+1 , h t+1 |U1t ] = Cov[m Y (U1t+1 ), h t+1 |U1t ] < 0.

(5.33)

Barring the restriction (5.30), if m Y (U1t+1 ) is truly a function of Ut+1 , the condition in (5.33) amounts to the negativity of the sum of two terms: Cov[m Y (U1t+1 ), Var[m Y (U1t+2 )|U1t+1 ]|U1t ]

(5.34)

Cov[m Y (U1t+1 ), E[σ 2Y (U1t+2 )|U1t+1 ]|U1t ].

(5.35)

and:

In other words, the leverage effect of the stochastic volatility process Yt+1 can be produced by any of the two following leverage effects or both.27 The conditional 25 We will conduct the discussion below in terms of m (U t+1 ) but it could be reinterpreted in terms of Y 1 m Y (U1t+1 ) + log(ϕ(U1t+1 ) + 1)/ϕ(U1t ). 26 The key point is that the mean functions m (U t+1 ) and m (U t+1 ) depend on U X 1 Y 1 t+1 . However, if these

functions are replaced by the shifted conditional expectations for asset returns according to (5.9), the functions σ X (U1t+1 ), σ Y (U1t+1 ) and σ X Y (U1t+1 ) will be reintroduced in these expected returns through the functions

λ(U1t+1 ) and ϕ(U1t+1 ) defined by Proposition 5.3. 27 This decomposition of the leverage effect in two terms is the exact analogue of the decomposition discussed in Fiorentini and Sentana (1998) and Meddahi (1999) for persistence.

182

´ Renault R. Garcia and E.

mean process m Y (U1t+1 ) may be a stochastic volatility process which features a leverage effect defined by the negativity of (5.34). Or the process Yt+1 itself may be characterized by a leverage effect and then (5.35) be negative, which means that bad news about expected returns (when m Y (U1t+1 ) is smaller than its unconditional expectations) implies on average a higher expected volatility of Y , that is a value of E[σ 2Y (U1t+2 )|U1t+1 ] greater than its unconditional mean. To summarize, Assumption 5.4 not only allows us to capture the standard features of a stochastic volatility model (in terms of heavy tails and leverage effects) but also provides for a richer set of possible dynamics. Moreover, we can certainly extend these ideas to multivariate dynamics either for the joint behavior of market and stock returns or for any portfolio consideration. For instance, the dependence of σ X Y (U1t+1 ) on the whole set of state variables offers great flexibility to model the stochastic behavior of correlation coefficients, as recently put forward empirically by Andersen et al. (1999). This last feature is clearly highly relevant for asset allocation or conditional beta pricing models.

6 Conclusion In this chapter, we provided a unifying analysis of latent variable models in finance through the concept of stochastic discount factor (SDF). We extended both the asset pricing factor models and the equilibrium dynamic asset pricing models through a conditioning on state variables. This conditioning enriches the dynamics of asset returns through instantaneous causality between the asset returns and the latent variables. Such correlation or leverage effects explain departures from usual CAPM pricing for stocks or Black and Scholes and Hull and White pricing for options. The dependence of conditional covariances on the state variables allows for a rich dynamic stochastic behavior of correlation coefficients which is important for asset allocation or value-at-risk strategies. The enriched set of empirical implications from such dynamic latent variable models requires us to set up a general inference methodology which will account for the inobservability of both cross-sectional factors and longitudinal latent variables. Indirect inference, efficient method of moments or Markov chain Monte Carlo (MCMC) for Bayesian inference are all avenues that can prove useful in this context, since they have been used successfully in stochastic volatility models.

References Amin, K.I. and Jarrow, R. (1992), Pricing options in a stochastic interest rate economy, Mathematical Finance, 3(3), 1–21. Amin, K.I. and Ng, V.K. (1993), Option Valuation with Systematic Stochastic Volatility, Journal of Finance, XLVIII, 3, 881–909.

5. Latent Variable Models for SDFs

183

Andersen, T.B., Bollerslev, T., Diebold, F.X. and Labys, P. (1999), The distribution of exchange rate volatility, NBER Working Paper no. 6961. Bansal, R., Hsieh, D. and Viswanathan, S. (1993), No arbitrage and arbitrage pricing: a new approach, Journal of Finance 48, 1231–62. Bartholomew, D.J. (1987), Latent Variable Models and Factor Analysis. Oxford University Press, Oxford. Black, F. (1976), Studies of stock market volatility Changes, 1976 Proceedings of the American Statistical Association, Business and Economic Statistics Section, pp. 177–81. Black, F. and Scholes, M. (1973), The pricing of options and corporate liabilities, Journal of Political Economy 81, 637–59. Breeden, D. and Litzenberger, R. (1978), Prices of state-contingent claims implicit in option prices, Journal of Business 51, 621–51. Burt, C. (1941), The Factors of the Mind: An Introduction to factor Analysis in Psychology. Macmillan, New York. Chamberlain, G. and Rothschild, M. (1983), Arbitrage and mean variance analysis on large asset markets, Econometrica 51, 1281–304. Clark, P.K. (1973), A subordinated stochastic process model with variance for speculative prices, Econometrica 41, 135–56. Cox, D.R. (1981), Statistical analysis of time series: some recent developments, Scandinavian Journal of Statistics 8, 93–115. Cox, J., Ingersoll, J. and Ross, S. (1981), A reexamination of traditional hypotheses about the term structure of interest rates, Journal of Finance 36, 769–99. Dai, Q. and Singleton, K.J. (1999), Specification analysis of term structure models, forthcoming in the Journal of Finance. Diebold, F.X. and Nerlove, M. (1989), The dynamics of exchange rate volatility: a multivariate latent factor ARCH model, Journal of Applied Econometrics 4, 1–21. Duan, J.C. (1995), The GARCH option pricing model, Mathematical Finance 5, 13–32. Duffie D. and Kan, R. (1996), A yield-factor model of interest rates, Mathematical Finance, 379–406. Engle, R.F., Ng, V. and Rothschild, M. (1990), Asset pricing with a factor arch covariance structure: empirical estimates with treasury bills, Journal of Econometrics 45, 213–38. Epstein, L. and Zin, S. (1989), Substitution, risk aversion and the temporal behavior of consumption and asset returns I: a theoretical framework, Econometrica 57, 937–69. Epstein, L. and Zin, S. (1991), Substitution, risk aversion and the temporal behavior of consumption and asset returns I: an empirical analysis, Journal of Political Economy 99, 2, 263–86. Ferson, W.E. and Korajczyk, R.A. (1995), Do arbitrage pricing models explain the predictability of stock returns, Journal of Business 68, 309–49. Fiorentini, G. and Sentana, E. (1998), Conditional means of time series processes and time series processes for conditional means, International Economic Review 39, 1101–18. Florens, J.-P. and Mouchart, M. (1982), A note on noncausality, Econometrica 50(3), 583–91. Florens, J.-P., Mouchart, M. and J.-Rollin, P. (1990), Elements of Bayesian Statistics. Dekker, New York. Gallant, A.R., Hsieh, D. and Tauchen, G. (1991), on fitting a recalcitrant series: the pound/dollar exchange rate 1974–1983, Nonparametric and Semiparametric Methods in Econometrics and Statistics, (eds. William Barnett, A., Jim Powell and

184

´ Renault R. Garcia and E.

Georges Tauchen), Cambridge University Press, Cambridge. Garcia R., Luger, R. and Renault, E. (1999), Asymmetric smiles, leverage effects and structural parameters, working paper, CIRANO, Montreal, Canada. Ghysels, E., Harvey, A. and Renault, E. (1996), Stochastic Volatility, Statistical Methods in Finance (C. Rao, R. and Maddala, G.S.). North-Holland, Amsterdam, pp. 119–91. Granger, C.W.J. (1969), Investigating causal relations by econometric models and cross-spectral methods, Econometrica 37, 424–38. Hamilton, J.D. (1989), A new approach to the economic analysis of nonstationary time series and the business cycle, Econometrica 57, 357–84. Hansen, L. and Richard, S. (1987), The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models, Econometrica 55, 587–614. Harrison, J.M. and Kreps, D. (1979), Martingale and Arbitrage in Multiperiod Securities Markets, Journal of Economic Theory 20, 381–408. Harvey, A. (1989), Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, Cambridge. Harvey, C.R. (1991), The world price of covariance risk, Journal of Finance 46, 111–57. Hull, J. and White, A. (1987), The pricing of options on assets with stochastic volatilities, Journal of Finance XLII, 281–300. Kallsen, J. and Taqqu, M.S. (1998), Option pricing in ARCH-type models, Mathematical Finance, 13–26. King, M., Sentana, E. and Wadhwani, S. (1994), Volatility and links between national stock markets, Econometrica 62, 901–33. Lintner, J. (1965), The Valuation of risk assets and the selection of risky investments in stock portfolio and capital budgets, Review of Economics and Statistics 47, 13–37. Kreps, D. and Porteus, E. (1978), Temporal resolution of uncertainty and dynamic choice theory, Econometrica 46, 185–200. Lucas, R. (1978), Asset prices in an exchange economy, Econometrica 46, 1429–45. Meddahi, N. (1999), Aggregation of long memory processes, unpublished paper, Universit´e de Montr´eal. Meddahi, N. and Renault, E. (1996), Aggregation and marginalization of GARCH and stochastic volatility models, GREMAQ DP 96.30.433, Toulouse. Merton, R.C. (1973), Rational theory of option pricing, Bell Journal of Economics and Management Science 4, 141–83. Nelson, D.B. (1991), Conditional heteroskedasticity in asset returns: a new approach, Econometrica 59, 347–70. Pitt, M.K. and Shephard, N. (1999), Time-varying covariances: a factor stochastic volatility approach, Bayesian Statistics 6, 547–70. Renault, E. (1999), Dynamic Factor Models in Finance, Core Lectures. Oxford University Press, Oxford, forthcoming. Ross, S. (1976), The arbitrage theory of capital asset pricing, Journal of Economic Theory 13, 341–60. Sharpe, W.F. (1964), Capital asset prices: a theory of market equilibrium under conditions of risk, Journal of Finance 19, 425–42. Sims, C.A. (1972), Money, income and causality, American Economic Review 62, 540–52. Spearman, C. (1927), The Abilities of Man. Macmillan, New York. Turnbull, S. and Milne, F. (1991), A simple approach to interest-rate option pricing, Review of Financial Studies 4, 87–121.

6 Monte Carlo Methods for Security Pricing∗ Phelim Boyle, Mark Broadie and Paul Glasserman

1 Introduction In recent years the complexity of numerical computation in financial theory and practice has increased enormously, putting more demands on computational speed and efficiency. Numerical methods are used for a variety of purposes of finance. These include the valuation of securities, the estimation of their sensitivities, risk analysis, and stress testing of portfolios. The Monte Carlo method is a useful tool for many of these calculations, evidenced in part by the voluminous literature of successful applications. For a brief sampling, the reader is referred to the stochastic volatility applications in Duan (1995), Hull and White (1987), Johnson and Shanno (1987), and Scott (1987);1 the valuation of mortgage-backed securities in Schwartz and Torous (1989); the valuation of path-dependent options in Kemna and Vorst (1990); the portfolio optimization in Worzel et al. (1994); and the valuation of interest-rate derivative claims in Carverhill and Pang (1995). In this paper we focus on recent methodological developments. We review the Monte Carlo approach and describe some recent applications in the finance area. In modern finance, the prices of the basic securities and the underlying state variables are often modelled as continuous-time stochastic processes. A derivative security, such as a call option, is a security whose payoff depends on one or more of the basic securities. Using the assumption of no arbitrage, financial economists have shown that the price of a generic derivative security can be expressed as the expected value of its discounted payouts. This expectation is taken with respect to a transformation of the original probability measure known as the equivalent martingale measure or the risk-neutral measure. The book by Duffie (1996) provides an excellent account of this material. The Monte Carlo method lends itself naturally to the evaluation of security prices represented as expectations. Generically, the approach consists of the following ∗ Reprinted form the Journal of Economic Dynamics and Control 21 (1977) 1267–1321. 1 Wiggins (1987) also studies pricing under stochastic volatility but does not use Monte Carlo simulation.

185

186

P. Boyle, M. Broadie and P. Glasserman

steps: • Simulate sample paths of the underlying state variables (e.g., underlying asset prices and interest rates) over the relevant time horizon. Simulate these according to the risk-neutral measure. • Evaluate the discounted cash flows of a security on each sample path, as determined by the structure of the security in question. • Average the discounted cash flows over sample paths. In effect, this method computes a multi-dimensional integral – the expected value of the discounted payouts over the space of sample paths. The increase in the complexity of derivative securities in recent years has led to a need to evaluate high dimensional integrals. Monte Carlo becomes increasingly attractive compared to other methods of numerical integration as the dimension of the problem increases. Consider the integral of the function f (x) over the d-dimensional unit hypercube. The simple (or crude) Monte Carlo estimate of the integral is equal to the average value of the function f over n points selected at random2 from the unit hypercube. From the strong law of large numbers this estimate converges to the true value of the integrand as n tends to infinity. In addition, the central limit theorem assures us √ that the standard error3 of the estimate tends to zero as 1/ n. Thus the error convergence rate is independent of the dimension of the problem and this is the dominant advantage of the method over classical numerical integration approaches. The only restriction on the function f is that it should be square integrable, and this is a relatively mild restriction. Furthermore, the Monte Carlo method is flexible and easy to implement and modify. In addition, the increased availability of powerful computers has enhanced the attractiveness of the method. There are some disadvantages of the method but in recent years progress has been made in overcoming them. One drawback is that for very complex problems a large number of replications may be required to obtain precise results. Different variance reduction techniques have been developed to increase precision. Two of the classical variance reduction techniques are the control variate approach and the antithetic variate method. More recently, moment matching, importance sampling, and conditional Monte Carlo methods have been introduced in finance applications. Another technique for speeding up the valuation of multidimensional integrals uses deterministic sequences rather than random sequences. These deterministic 2 In standard Monte-Carlo application the n points are usually not truly random but are generated by a deter-

ministic algorithm and are described as pseudorandom numbers. 3 We can readily estimate the variance of the Monte Carlo estimate by using the same set of n random numbers to estimate the expected value of f 2 .

6. Monte Carlo Methods for Security Pricing

187

sequences are chosen to be more evenly dispersed throughout the region of integration than random sequences. If we use these sequences to estimate multidimensional integrals we can often improve the convergence. Deterministic sequences with this property are known as low-discrepancy sequences or quasi-random sequences. Using this approach one can in theory derive deterministic error bounds, though the practical use of the bounds is problematic. In contrast, standard Monte Carlo yields simple, useful probabilistic error bounds. Although low-discrepancy sequences are well known in computational physics they have only recently been applied in finance problems. There are different procedures for generating such low-discrepancy sequences and these procedures are generally based on number theoretic methods. We describe some of the recent developments in this area. We also discuss applications of this approach to problems in finance and conduct some rough comparisons between standard Monte Carlo methods and two different quasi-random approaches. Until recently, the valuation of American style options was widely considered outside the scope of Monte Carlo. However Tilley (1993), Barraquand and Martineau (1995), and Broadie and Glasserman (1997), and have proposed approaches to this problem, and there has been other related work as well. We provide a brief survey of the recent research progress in this area. The layout of the paper is as follows. Variance reduction techniques are described in the next section. The ideas behind the use of low-discrepancy sequences and brief numerical comparisons with standard Monte Carlo methods are given in Section 3. Price sensitivity estimation using simulation is discussed in Section 4. Various approaches to pricing American options using simulation are briefly described in Section 5. Other issues are touched on briefly in Section 6.

2 Variance reduction techniques In this section, we first discuss the role of variance reduction in meeting the broader objective of improving the computational efficiency of Monte Carlo simulations. We then discuss specific variance reduction techniques and illustrate their application to pricing problems.

2.1 Variance reduction and efficiency improvement The reduction of variance seems so obviously desirable that the precise argument for its benefit is sometimes overlooked. We briefly review the underlying justification for variance reduction and examine it from the perspective improving computational efficiency.

188

P. Boyle, M. Broadie and P. Glasserman

Suppose we want to compute a parameter θ – for example, the price of a derivative security. Suppose we can generate by Monte Carlo an i.i.d. sequence {θˆ i , i = 1, 2, . . .}, where each θˆ i has expectation θ and variance σ 2 . A natural estimator of θ based on n replications is then the sample mean n 1 θˆ i . n i=1

By the central limit theorem, for large n this sample mean is approximately normally distributed with mean θ and variance σ 2 /n. Probabilistic error bounds in the form of confidence intervals follow readily from the normal approximation, and √ indicate that the error in the estimator is proportional to σ / n. Thus, decreasing the variance σ 2 by a factor of 10, say, while leaving everything else unchanged, does as much for error reduction as increasing the number of samples by a factor of 100. Suppose, now, that we have a choice between two types of Monte Carlo esti(1) (2) mates which we denote by {θˆ i , i = 1, 2, . . .} and {θˆ i , i = 1, 2, . . .}. Suppose (1) (2) that both are unbiased, so that E[θˆ i ] = E[θˆ i ] = θ, but σ 1 < σ 2 , where

σ 2j = Var[θˆ

( j)

], j = 1, 2. From our previous observations it follows that a (1) sample mean of n replications of θˆ gives a more precise estimate of θ than (2) does a sample mean of n replications of θˆ . But this analysis oversimplifies the comparison because it fails to capture possible differences in the computational (1) effort required by the two estimators. Generating n replications of θˆ may be (2) more time-consuming than generating n replications of θˆ ; smaller variance is not sufficient grounds for preferring one estimator over another. To compare estimators with different computational requirements as well as different variances, we argue as follows. Suppose the work required to generate ( j) one replication of θˆ is a constant b j , j = 1, 2. (In some problems, the work per replication is stochastic; assuming it is constant simplifies the discussion.) With ( j) computing time t, the number of replications of θˆ that can be generated is 8t/b j 9; for simplicity, we drop the 8·9 and treat the ratios t/b j as though they were integers. The two estimators available with computing time t are therefore t/b b1 1 ˆ (1) θ t i=1 i

and

t/b b2 2 ˆ (2) θ . t i=1 i

For large t, these are approximately normally distributed with mean θ and with standard deviations ) ) b1 b2 and σ 2 . σ1 t t

6. Monte Carlo Methods for Security Pricing

189

Thus, for large t, the first estimator should be preferred over the second if σ 21 b1 < σ 22 b2 .

(1)

Equation (1) provides a sound basis for trading-off estimator variance and computational requirements. In light of the discussion leading to (1), it is reasonable to take the product of variance and work per run as a measure of efficiency. Using efficiency as a basis for comparison, the lower-variance estimator should be preferred only if the variance ratio σ 21 /σ 22 is smaller than the work ratio b2 /b1 . By the same argument, a higher-variance estimator may actually be preferable if it takes much less time to generate. In its simplest form, the principle expressed in (1) dates at least to Hammersley and Handscomb (1964, p.22). More recently, the idea has been substantially extended by Glynn and Whitt (1992). They allow the work per run to be random (in which case each b j is the expected work per run) and also consider efficiency in the presence of bias.

2.2 Antithetic variates Equipped with a basis for evaluating potential efficiency improvements, we can now consider specific variance reduction techniques. One of the simplest and most widely used techniques in financial pricing problems is the method of antithetic variates. We introduce it with a simple example, then generalize. Consider the problem of computing the Black–Scholes price of a European call option on a no-dividend stock. Of course, there is no need to evaluate this price by simulation, but the example serves as a useful introduction. In the Black–Scholes model, the stock price follows a lognormal diffusion. Independent replications of the terminal stock price under the risk-neutral measure can be generated from the formula ST(i) = S0 e(r − 2 σ 1

2 )T +σ

√

T Zi

,

i = 1, . . . , n,

(2)

where S0 is the current stock price, r is the riskless interest rate, σ is the stock’s volatility, T is the option’s maturity, and the {Z i } are independent samples from the standard normal distribution. See, e.g., Hull (2000) for background on this model, and see Devroye (1986) for methods of sampling from the normal distribution. Based on n replications, a moment-matched estimator of the price of an option with strike K is given by n n 1 1 Cˆ = Ci ≡ e−r T max{0, ST(i) − K }. n i=1 n i=1

(3)

190

P. Boyle, M. Broadie and P. Glasserman

In this context, the method of antithetic variates4 is based on the observation that if Z i has a standard normal distribution, then so does −Z i . The price S˜ T(i) obtained from (2) with Z i replaced by −Z i is thus a valid sample from the terminal stock price distribution. Similarly, each C˜ i = e−r T max{0, S˜ T(i) − K } is an unbiased estimator of the option price, as is therefore n 1 Ci + C˜ i Cˆ AV = . n i=1 2

A heuristic argument for preferring Cˆ AV notes that the random inputs obtained from the collection of antithetic pairs {(Z i , −Z i )} are more regularly distributed than a collection of 2n independent samples. In particular, the sample mean over the antithetic pairs always equals the population mean of 0, whereas the mean over finitely many independent samples is almost surely different from 0. If the inputs are made more regular, it may be hoped that the outputs are more regular as well. Indeed, a large value of ST(i) resulting from a large Z i will be paired with a small value of S˜ T(i) obtained from −Z i . A more precise argument compares efficiencies. Because Ci and C˜ i have the same variance, 1 Ci + C˜ i = (Var[Ci ] + Cov[Ci , C˜ i ]). Var (4) 2 2 ˆ if Cov[Ci , C˜ i ] ≤ Var[Ci ]. However, Cˆ AV uses Thus, we have Var[Cˆ AV ] ≤ Var[C] ˆ so we must account for differences in computatwice as many replications as C, tional requirements. If generating the Z i takes a negligible fraction of the work per replication (which would typically be the case in the pricing of a more elaborate ˆ option), then the work to generate Cˆ AV is roughly double the work to generate C. Thus, for antithetics to increase efficiency, we require ˆ 2 Var[Cˆ AV ] ≤ Var[C], which, in light of (4), simplifies to the requirement that Cov[Ci , C˜ i ] ≤ 0. That this condition is met is easily demonstrated. Define φ so that Ci = φ(Z i ); φ is the composition of the mappings from Z i to the stock price and from the stock price to the discounted option payoff. As the composition of two increasing functions, φ is monotone, so by a standard inequality (e.g., Section 2.2 of Barlow 4 This method was introduced to option pricing in Boyle (1977), where its use was illustrated in the pricing of

a European call on a dividend-paying stock.

6. Monte Carlo Methods for Security Pricing

191

E[φ(Z i )φ(−Z i )] ≤ E[φ(Z i )]E[φ(−Z i )],

(5)

and Proschan 1975) i.e., Cov[Ci , C˜ i ] ≡ E[φ(Z i )φ(−Z i )] − E[φ(Z i )]E[φ(−Z i )] ≤ 0, and we may conclude that antithetics help. This argument can be adapted to show that the method of antithetic variates increases efficiency in pricing a European put and other options that depend monotonically on inputs (e.g., Asian options). The notable departure from monotonicity in some barrier options (e.g., a down-and-in call) suggests that the use of antithetics in pricing these options may sometimes be less effective. In computing confidence intervals with antithetic variates, it is essential that the standard error be estimated using the sample standard deviation of the n averaged pairs (C i + C˜ i )/2 and not the 2n individual observations C1 , C˜ 1 , . . . , Cn , C˜ n . The averaged pairs are independent but the individual observations are not. This is a case (we will see others shortly) in which the use of a variance reduction technique affects the estimation of the standard error and, in particular, requires some “batching” of observations to deal with dependence. It is worth noting that the method of antithetic variates is by no means restricted to simulations whose only stochastic inputs are standard normal variates. The most primitive stochastic input in most simulations is a sequence {Un } of independent variates uniformly distributed on the unit interval. In this case, 1 − Un has the same distribution as Un , and the pair (Un , 1 − Un ) are called antithetic because they exhibit negative dependence. If the simulation output depends monotonically on the input random numbers, then the output obtained from {1 − U1 , 1 − U2 , . . .} will be negatively correlated with that obtained from {U1 , U2 , . . .}, resulting in increased efficiency compared with independent replications. For further general background on antithetic variates and other methods based on correlation induction, see Bratley, Fox, and Schrage (1987), Hammersley and Handscomb (1964), Glynn and Iglehart (1988), and references there. For some examples of application in finance, see Boyle (1977), Clewlow and Carverhill (1994), and Hull and White (1987). 2.3 Control variates The method of control variates is among the most widely applicable, easiest to use, and effective of the variance reduction techniques.5 Simply put, the principle underlying this technique is “use what you know.” The most straightforward implementation of control variates replaces the evaluation of an unknown expectation with the evaluation of the difference between 5 The earliest application of this technique to option pricing is Boyle (1977).

192

P. Boyle, M. Broadie and P. Glasserman

the unknown quantity and another expectation whose value is known. A specific illustration can be found in the analysis of Boyle and Emanuel (1985) and Kemna and Vorst (1990) of Asian options. Let PA be the price of an option whose payoff depends on the arithmetic average of the underlying asset. Let PG be the price of an option equivalent in every respect except that a geometric average replaces the arithmetic average. Most options based on averages use arithmetic averaging, so PA is of much greater practical value; but whereas PA is analytically intractable, PG can often be evaluated in closed form. Can knowledge of PG be leveraged to compute PA ? It can, through the control variate method. Write PA = E[ PˆA ] and PG = E[ PˆG ], where PˆA and PˆG are the discounted option payoffs for a single simulated path of the underlying asset. Then PA = PG + E[ PˆA − PˆG ]; in other words, PA can be expressed as the known price PG plus the expected difference between PˆA and PˆG . An unbiased estimator of PA is thus provided by PˆAcv = PˆA + (PG − PˆG ).

(6)

This representation6 suggests a slightly different interpretation: PˆAcv adjusts the straightforward estimator PˆA according to the difference between the known value PG and the observed value PˆG . The known error (PG − PˆG ) is used as a control in the estimation of PA . If most of the computational effort goes to generating paths of the underlying asset, then the additional work required to evaluate PˆG along with PˆA is minor. It therefore seems reasonable to compare variances alone. Since Var[ PˆAcv ] = Var[ PˆA ] + Var[ PˆG ] − 2 Cov[ PˆA , PˆG ], this method if effective if the covariance between PˆA and PˆG is large. The numerical results of Kemna and Vorst indicate that this is indeed the case. Fu, Madan, and Wang (1998) have investigated the use of other control variates for Asian options, based on Laplace transform values. These appear to be less strongly correlated with the option price. A closer examination of (6) reveals that this estimator does not make optimal use of the relation between the two option prices. Consider the family of unbiased estimators β Pˆ A = PˆA + β(PG − PˆG ),

(7)

6 To go from (6) to Boyle’s (1977) example, let P be the price of a European call option on a no-dividend G stock and let PA be the corresponding option price in the presence of dividends.

6. Monte Carlo Methods for Security Pricing

193

parameterized by the scalar β. We have β Var[ PˆA ] = Var[ PˆA ] + β 2 Var[ PˆG ] − 2β Cov[ PˆA , PˆG ].

The variance-minimizing β is therefore β∗ =

Cov[ PˆA , PˆG ] . Var[ PˆG ]

Depending on the application, β ∗ may or may not be close to 1, the implicit value in (6). In using an estimator of the form (6), we forgo an opportunity for greater variance reduction. Indeed, whereas (6) may increase or decrease variance, an estimator based on β ∗ is guaranteed not to increase variance, and will result in a strict decrease in variance so long as PˆA and PˆG are not uncorrelated. In practice, of course, we rarely know β ∗ because we rarely know Cov[ PˆA , PˆG ]. However, given n independent replications {(PAi , PGi ), i = 1, . . . , n} of the pairs ( PˆA , PˆG ) we can estimate β ∗ via regression. At this point we face a choice. Using all n replications to compute an estimate βˆ of β ∗ introduces a bias in the estimator n n 1 1 ˆ G− PAi + β(P PGi ), n i=1 n i=1

and its estimated standard error because of the dependence between βˆ and the PGi . Reserving n 1 replications for the estimation of β ∗ and the remaining n − n 1 replications for the sample mean of the PGi (typically with n 1 : n) eliminates the bias but may deteriorate the estimate of β ∗ . Neither issue significantly limits the applicability of the method, because the possible bias vanishes as n increases and because the estimate of β ∗ need not be very precise to achieve a reduction in variance. The advantage of working with (7) over (6) becomes even more pronounced when further controls are introduced. For example, when the asset price is simulated under risk-neutral probabilities, the present value e−r T E[ST ] of the terminal price must equal the current price S0 . We can therefore form the estimator PˆA + β 1 (PG − PˆG ) + β 2 (S0 − e−r T ST ). The variance-minimizing coefficients (β ∗1 , β ∗2 ) are easily found by multiple regression. This optimization step seems particularly crucial in this case; for whereas one might guess that β ∗1 is close to 1, it seems unlikely that β ∗2 would be. Optimizing over the βs also allows us to exploit controls that are negatively correlated with the option payoff. For further general background on control variates see Bratley, Fox, and Schrage (1987), Glynn and Iglehart (1988), and Lavenberger and Welch (1981). For examples of control variate applications in finance, see Boyle (1977), Boyle and

194

P. Boyle, M. Broadie and P. Glasserman

Emanuel (1985), Broadie and Glasserman (1996), Carverhill and Pang (1995), Clewlow and Carverhill (1994), Duan (1995), and Kemna and Vorst (1990). 2.4 Moment matching methods Next we describe a variance reduction technique proposed by Barraquand (1995), who termed it quadratic resampling. His technique is based on moment matching. As before, we introduce it with the simple example of estimating the European call option price on a single asset and then generalize. Let Z i , i = 1, . . . , n, denote independent standard normals used to drive a simulation. The sample moments of the n Z ’s will not exactly match those of the standard normal. The idea of moment matching is to transform the Z ’s to match a finite number of the moments of the underlying population. For example, the first moment of the standard normal can be matched by defining n

Z˜ i = Z i − Z¯ ,

i = 1, . . . , n,

(8)

˜ where Z¯ = i=1 Z i /n is the sample mean of the Z ’s. Note that the Z i ’s are ˜ normally distributed if the Z i ’s are normal. However, the Z i ’s are not independent. As before, terminal stock prices are generated from the formula 1

S˜ T (i) = S0 e(r − 2 σ

2 )T +σ

√

T Z˜ i

,

i = 1, . . . , n.

An unbiased estimator of the call option price is the average of the n values C˜ i = e−r T max( S˜ T (i) − K , 0). In the standard Monte Carlo method, confidence intervals for the true value C could be estimated from the sample mean and variance of estimator. This cannot be done here since the n values of Z˜ are no longer independent, and hence the values C˜ i are not independent. This points out one drawback of the moment matching method: confidence intervals are not as easy to obtain.7 Indeed, for confidence intervals it appears to be necessary to apply moment matching to independent batches of runs and estimate the standard error from the batch means. This reduces the efficacy of the method compared with matching moments across all runs. Equation (8) showed one way to match the first moment of a distribution with mean zero. If the underlying population does not have a zero mean, transformed Z ’s could be generated using Z˜ i = Z i − Z¯ + µ Z , where µ Z is the population mean. The idea can easily be extended to match two moments of a distribution. In this case, an appropriate transformation is σZ + µZ , i = 1, . . . , n, (9) Z˜ i = (Z i − Z¯ ) sZ 7 The point is not merely a minor technical issue. The sample variance of the C˜ ’s is usually a poor estimate of i

Var[C˜ i ].

6. Monte Carlo Methods for Security Pricing

195

where s Z is the sample standard deviation of the Z i ’s and σ Z is the population standard deviation. Of course, for a standard normal, µ Z = 0 and σ Z = 1. An estimator of the call option price is the average of the n values C˜ i . Using the transformation (9), the Z˜ i ’s are not normally distributed even if the Z i ’s are normal. Hence, the corresponding C˜ i are biased estimators of the true option value. For most financial problems of practical interest, this bias is likely to be small. However, the bias can be arbitrarily large in extreme circumstances (even when only the first moment of the distribution is matched).8 The dependence and bias in the moment matching method makes it difficult to quantify the improvement in general analytical terms. The moment matching method is another example of the idea to “use what you know.” In this simple European option example, the mean and variance of the terminal stock price ST is also known. So the moment matching idea could be applied to the simulated terminal stock values ST (i). In this case, to match the first moment, define S˜ T (i) = ST (i) − S¯ T + µ S , (10) T

where µ ST = S0 e two moments, define rT

and S¯ T is the sample mean of the ST (i)’s. To match the first σS S˜ T (i) = (ST (i) − S¯ T ) T + µ ST , s ST

(11)

where σ ST = S0 e2r T (eσ 2 T − 1) and s ST is the sample standard deviation of the ST (i)’s. Duan and Simonato (1998) use a related method. They apply a multiplicative transformation to asset prices to enforce the martingale property over a finite set of paths.9 They apply their method to GARCH option pricing. Comparisons of various moment matching strategies are given in Table 1. For this comparison, n = 100 simulation trials were used to estimate the European call option price. Standard errors were estimated by re-simulation. That is, m = 10 000 simulation trials were conducted, each one based on n replications of the estimator. The sample standard deviation of the m simulation estimates gives an estimate of the standard error of a single simulation estimate. Root-mean-squared errors are not reported because they are identical to the standard errors for the number of digits reported. 8 For example, let Z take the values +1 or −1 with probability one-half. Consider a security which pays +$1 if

Z = 1 and −$x if Z = 1. The expected payoff of the security is (1 − x)/2. To estimate this expected payoff by Monte Carlo simulation, draw n samples Z i according to the prescribed distribution. Then use equation (8) to define Z˜ i ’s which match the first moment. For almost all samples for any large n, the estimated expected payoff is −x and the bias is (1 + x)/2. This bias does not decrease as n increases. Care must be taken when using equation (8) or (9) when the support of the random variable of not the entire real line. For example, applying (8) or (9) to uniform or exponential random variables could cause the transformed values to fall outside of the relevant domain. 9 This is equivalent to enforcing put-call parity.

196

P. Boyle, M. Broadie and P. Glasserman

Table 1. Standard errors for European call options. S0 /K

No variance reduction

MM1 Equation (8)

MM2 Equation (9)

MM1 Equation (10)

MM2 Equation (11)

0.2

0.9 1.0 1.1

0.24 0.62 0.93

0.19 0.29 0.19

0.11 0.09 0.09

0.19 0.26 0.15

0.09 0.10 0.11

0.4

0.9 1.0 1.1 0.9 1.0 1.1

0.80 1.22 1.61 1.40 1.93 2.38

0.55 0.66 0.63 0.95 1.10 1.13

0.24 0.19 0.17 0.38 0.31 0.25

0.51 0.56 0.48 0.84 0.91 0.85

0.17 0.23 0.28 0.28 0.39 0.49

σ

0.6

All results are based on n = 100 simulation trials. The option parameters are: K = 100, r = 0.10, T = 0.2, with S0 and σ varying as indicated. Standard error estimates are based on m = 10 000 simulations.

The results in Table 1 show that matching two moments can reduce the simulation error by a factor ranging from 2 to 10. Matching two moments dominates matching one moment, but there is not a clear choice between transforming the original standard normals using (9) or the terminal stock prices using (11). Further computational results, not included in Table 1, indicate that the improvement factor with moment matching is essentially constant as n increases. This may seem counterintuitive, since the moment matching adjustments converge to zero as n increases. But the progressively smaller adjustments are equally important in reducing the estimation error as the number of simulation trials increases. For example, the standard error for n = 10 000 simulation trials is one-tenth of the corresponding number for n = 100 reported in Table 1. The moment matching method can be extended to match covariances. For options that depend on multiple assets, the entire covariance structure is typically a simulation input. Barraquand (1995) suggests a method to match the entire covariance structure and reports error reduction factors ranging from two to several hundred for this method applied to pricing options on the maximum of k assets. The moment matching procedure could be applied to matching higher order moments as well. In addition to different methods for transforming random outcomes to match specified moments, additional points could be added as another way to match moments. Whenever a moment is known, it can be used as a control rather than for moment matching. In an appendix, we give a theoretical argument favoring the use of moments as controls rather than for matching.

6. Monte Carlo Methods for Security Pricing

197

2.5 Stratified and Latin hypercube sampling Like many variance reduction techniques, stratified sampling seeks to make the inputs to simulation more regular than random inputs. In particular, it forces certain empirical probabilities to match theoretical probabilities, just as moment matching forces empirical moments to match theoretical moments. Consider, for example, the generation of 100 normal random variates as inputs to a simulation. The empirical distribution of an independent sample Z 1 , . . . , Z 100 will look only roughly like the normal density; the tails of the distribution – often the most important part – will inevitably be underrepresented. Stratified sampling can be used to force exactly one observation to lie between the (i − 1)th and i th percentile, i = 1, . . . , 100, and thus produce a better match to the normal distribution. One way to implement this generates 100 independent random variates U1 , . . . , U100 , uniform on [0, 1] and set Z˜ i = N −1 ((i + Ui − 1)/100), i = 1, . . . , 100, where N −1 is the inverse of the cumulative normal distribution. This works because (i + Ui − 1)/100 falls between the (i − 1)th and i th percentiles of the uniform distribution, and percentiles are preserved by the inverse transform. Of course, Z˜ 1 , . . . , Z˜ 100 are highly dependent, complicating the estimation of standard errors. Computing confidence intervals with stratified sampling typically requires batching the runs. For example, with a budget of 100 000 replications we might run 100 independent stratified samples each of size 1000, rather than a single stratified sample of size 100 000. To estimate standard errors we must therefore sacrifice some variance reduction, just as with moment matching. In principle, this approach applies in arbitrary dimensions. To generate a stratified sample from the d-dimensional unit hypercube, with n strata in each coordi(d) nate, we could generate a sequence of vectors U j = (U (1) j , . . . , U j ), j = 1, 2, . . ., and then set U j + (i 1 , . . . , i d ) Vj = , i k = 0, . . . , n − 1, k = 1, . . . , d. n Exactly one V j will lie in each of the n d cubes defined by the product of the n strata in each coordinate. The difficulty in high dimensions is that generating even a single stratified sample of size n d may be prohibitive unless n is very small. Latin hypercube sampling can be viewed as a way of randomly sampling n points of a stratified sample while preserving some of the regularity from stratification. The method was introduced by McKay, Conover, and Beckman (1979) and further analyzed in Stein (1987). It works as follows. Let π 1 , . . . , π d be independent random permutations of {1, . . . , n}, each uniformly distributed over all n! possible permutations. Set V j(k)

=

U (k) j + π k ( j) − 1 n

,

k = 1, . . . , d,

j = 1, . . . , n.

198

P. Boyle, M. Broadie and P. Glasserman

The randomization ensures that each vector V j is uniformly distributed over the d-dimensional hypercube. At the same time, the coordinates are perfectly stratified in the sense that exactly one of V1(k) , . . . , Vn(k) falls between ( j −1)/n and j/n, j = 1, . . . , n, for each dimension k = 1, . . . , d. As before, the dependence introduced by this method implies that standard errors can be estimated only through batching. These methods can be viewed as part of a hierarchy of methods introducing additional levels of regularity in inputs at the expense of complicating the estimation of errors. Some, like stratified sampling, fix the size of the sample while others leave flexibility. The extremes of this hierarchy are straightforward Monte Carlo (completely random) and the low-discrepancy methods (completely deterministic) discussed in Section 3. Owen (1995a, 1995b) discusses these and other methods and introduces a hybrid that combines the regularity of low-discrepancy methods with the simple error estimation of standard Monte Carlo. Shaw (1995) uses an extension proposed by Stein (1987) to handle dependent inputs in a novel approach to estimating value at risk.

2.6 Some numerical comparisons The variance reduction methods discussed thus far are fairly generic, in the sense that they do not rely on the detailed structure of the security to be priced. This contrasts with the remaining two methods that we discuss – importance sampling and conditional Monte Carlo. These methods must be carefully tailored to each application. It therefore seems appropriate to digress briefly into a numerical comparison of the generic methods on some option pricing problems. We first examine the performance of these methods in pricing Asian options. The payoff of a discretely sampled arithmetic average Asian option is max( S¯ − k Si /k, Si is the asset price at time ti = i T /k, and T is the K , 0), where S¯ = i=1 option maturity. The value of the option is E[e−r T max( S¯ − K , 0)]. There is no easily evaluated closed-form expression for this option value. Various formulas to approximate the Asian option price have been developed, but simulation is usually used to test the accuracy of the approximations. For this Asian option, k random numbers are needed to simulate one option payoff, and nk random numbers are needed in total. Moment matching (MM2, for two moments) was applied k times to the n numbers used to generate each Si at time ti . Latin hypercube sampling (LHS) was applied to sample n points from the k-dimensional unit cube. The discretely sampled geometric average Asian price was used as a control variate (see Turnbull and Wakeman 1991 for a closed-form solution for this price). Results appear in Table 2. The results in Table 2 indicate that matching two moments can reduce the simulation error by a factor ranging from 1 to 10. Using the geometric average Asian

6. Monte Carlo Methods for Security Pricing

199

Table 2. Standard errors for arithmetic average Asian options. K /S0

No variance reduction

Antithetic method

Control variate

MM2

LHS

0.2

0.9 1.0 1.1

0.053 0.344 0.566

0.052 0.231 0.068

0.003 0.004 0.006

0.048 0.162 0.052

0.049 0.161 0.058

0.4

0.9 1.0 1.1 0.9 1.0 1.1

0.308 0.694 1.017 0.632 1.052 1.443

0.297 0.506 0.388 0.583 0.817 0.759

0.014 0.017 0.021 0.032 0.038 0.047

0.240 0.352 0.281 0.451 0.566 0.539

0.248 0.354 0.289 0.455 0.578 0.560

σ

0.6

All results are based on n = 100 simulation trials with k = 50 prices in the average. The option parameters are: K = 100, r = 0.10, T = 0.2, with S0 and σ varying as indicated. Standard error estimates based on m = 10 000 simulations. The geometric average Asian option is used as the control variate. Moment matching (MM2) was applied to the i th price in the average, i = 1, . . . , 5, across replications.

option price as a control variate reduces error by a factor ranging from 20 to 100, and is consistently the most effective method. LHS and MM2 perform similarly. Antithetics are consistently dominated by the other methods. Next we compare these variance reduction techniques in pricing down-and-out call options with discrete barriers. The payoff of this option at expiration is the standard call option payoff if the asset price Si exceeds the barrier H at all times ti = i T /k, i = 1, . . . , k, otherwise the payoff is zero. The option is knocked out if Si ≤ H at any time ti . As a control we use the Black–Scholes price of a standard call. Moment matching and LHS are implemented as with the Asian option. Results are given in Table 3. These are consistent with the pattern in Table 2, except that the superiority of the control variate method is less pronounced. Although it is always risky to draw conclusions from limited numerical evidence, we suggest the following broad conclusions. The antithetic method is easy to implement, but often leads to only modest error reductions. Moment matching is similarly easy to implement and often leads to significant error reductions, but the error estimation is more difficult and bias is a potential problem. LHS suffers from the same error estimation difficulty but does not introduce bias. The control variate technique can lead to very substantial error reductions, but its effectiveness hinges on finding a good control for each problem.

200

P. Boyle, M. Broadie and P. Glasserman

Table 3. Standard errors for down-and-out call options with discrete barriers. K /S0

No variance reduction

Antithetic method

Control variate

MM2

LHS

0.2

0.9 1.0 1.1

0.96 0.62 0.30

0.44 0.44 0.28

0.37 0.13 0.03

0.43 0.31 0.22

0.39 0.30 0.22

0.4

0.9 1.0 1.1 0.9 1.0 1.1

1.59 1.22 0.88 2.19 1.86 1.54

1.15 1.00 0.82 1.83 1.62 1.40

0.73 0.45 0.26 1.07 0.80 0.58

0.95 0.76 0.61 1.44 1.25 1.09

0.88 0.74 0.61 1.36 1.23 1.09

σ

0.6

All results are based on n = 100 simulation trials. There are k = 5 points in the discrete barrier at 95. The other option parameters are: S0 = 100, r = 0.10, T = 0.2, with K and σ varying as indicated. Standard error estimates are based on m = 10 000 simulations. The standard European call option (Black–Scholes formula) is used as the control variate. Moment matching (MM2) was applied to the i th return, i = 1, . . . , 5, across replications.

2.7 Importance sampling This technique builds on the observation that an expectation under one probability measure can be expressed as an expectation under another through the use of a likelihood ratio or Radon–Nikodym derivative. This idea is familiar in finance because it underlies the representation of prices as expectations under a martingale measure. In Monte Carlo, the change of measure is used to try to obtain a more efficient estimator. We present some examples using this technique; for general background see Bratley et al. (1987) or Hammersley and Handscomb (1964). As a simple example, consider the evaluation of the Black–Scholes price of a call option – i.e., the computation of e−r T E[max{ST − K , 0}] with ST as in (2). A straightforward approach generates samples of the terminal value ST consistent with a geometric Brownian motion having drift r and volatility σ , just as in (2). But we are in fact free to generate ST consistent with any other drift µ, provided we weight the result with a likelihood ratio. For emphasis, we subscript the expectation operator with the drift parameter. Then E r [max{ST − K , 0}] = E µ [max{ST − K , 0}L], where the likelihood ratio L is the ratio of the lognormal densities with parameters

6. Monte Carlo Methods for Security Pricing

201

r and µ evaluated at ST , given by L=

ST S0

r −µ 2 σ

(µ2 − r 2 )T exp . 2σ 2

Indeed, ST need not even be sampled from a lognormal distribution. The only requirement is that the support of the importance sampling measure contain the support of the original measure so that the likelihood ratio is well-defined; this is an absolute continuity requirement. In the example above, this means that any distribution for ST whose support includes (0, ∞) is admissible. Ideally, one would like to choose the importance sampling distribution to reduce variance. In the example above, one obtains a zero-variance estimator by sampling ST from the density f (x) = c−1 max{x − K , 0}e−r T g(x), where g is the (lognormal) density of ST and c is a normalizing constant that makes f integrate to 1. The difficulty is that c is the Black–Scholes price itself, so this method requires knowledge of the solution for its implementation. Nevertheless, it gives some indication of the potential gain from importance sampling. Reider (1993) has investigated the impact of importance sampling based on a change of drift and volatility. (Changing the volatility is consistent with absolute continuity in a discrete-time approximation of a diffusion though not in the continuous-time limit.) He finds that choosing the importance sampling distribution to have higher drift and volatility provides substantial variance reduction in pricing deep out-of-the-money options. He also investigates the combination of importance sampling with antithetic variates and control variates, and the use of put-call parity for indirect estimation. Nielsen (1994) has explored some related importance sampling ideas in sampling from a binomial tree. Andersen (1995) has developed a powerful application of importance sampling for simulating interest rates and has applied it to nonlinear stochastic differential equation models. We briefly describe his approach. Let rt be the instantaneous short rate described, e.g., by a diffusion model. Then B(T ) = E exp −

!

T

rt dt

0

is the price today of a zero-coupon bond with face value $1, maturing at time T . In, for example, the Cox–Ingersoll–Ross and Vasicek models,10 B(T ) is available 10 See, e.g., Hull (1993, Chapter 15) for background on these models.

202

P. Boyle, M. Broadie and P. Glasserman

in closed form. We may therefore define a new probability measure P¯ by setting T ! ¯ P(A) = E exp − rt dt − log B(T ) 1 A 0

for any event A, where 1 A denotes the indicator of the event A. Let E¯ denote ¯ Then for any random variable X , E[X ] = E[X ¯ LT ] expectation with respect to P. where the likelihood ratio L T is given by T rt dt + log B(T ) . L T = exp 0

T In particular, if we take X = exp(− 0 rt dt), we know that E[X ] = B(T ) and therefore B(T ) is the expectation under E¯ of X L T ; i.e., of T T exp − rt dt × exp rt dt + log B(T ) . 0

0

But this simplifies to B(T ) itself, meaning that we obtain a zero-variance estimator of the bond price by switching to the new probability measure. Moreover, Andersen shows that sample paths of rt can be generated under P¯ simply by applying a change of drift to the original process. As described above, the method would appear to require knowledge of the solution for its implementation. Nevertheless, the method has two important applications. The first is in the pricing of contingent claims. Because P¯ eliminates the variance of bond prices, it should be effective in reducing variance for pricing, e.g., European bond options expiring at time T . Andersen’s numerical results bear this out. A second application is in the pricing of bond models with no closed-form solutions: Andersen’s results show that the change of drift derived from a tractable model (like CIR or Vasicek) remains effective when applied to an intractable model, and this significantly expands the scope of the method. Importance sampling is frequently used to make rare events less rare; this is already suggested in Reider’s (1994) application to out-of-the-money options. Our next example further highlights this aspect through a new application to barrier options. We consider a knock-in option far from the barrier and use importance sampling to increase the probability of a payout. Suppose the barrier is monitored at discrete times nt, n = 0, 1, . . . , m, with T = T /m. Set the barrier at H = S0 e−b and the strike at K = S0 ec , with b, c > 0. A down-and-in call pays ST − K at time T if ST > K and Snt < H for some n = 1, . . . , m. We can write the price of the underlying at monitoring instants as n Xi , Snt = S0 eUn , Un = i=1

6. Monte Carlo Methods for Security Pricing

203

with the X i i.i.d. normal having mean (r − 12 σ 2 )t and variance σ 2 t. Let τ be the first time Un drops below −b; then the probability of a payout is P(τ < m, Um > c). If b and c are large, this probability is small, and most simulation runs return zero. Through importance sampling, we can increase this probability and thus get more information out of each run. Consider alternative probability measures Pµ1 ,µ2 that give Un a drift of µ1 t until τ and then switch the drift to µ2 t. Intuitively, we would like to make µ1 < 0 to drive the asset price to the barrier and then make µ2 > 0 to drive it above the strike. For any µ1 , µ2 , we have P(τ < m, Um > c) = E µ1 ,µ2 [L µ1 ,µ2 1{τ <m,Um >c} ]. The likelihood ratio is given by L µ1 ,µ2 = exp(−θ 1Uτ + ψ(θ 1 )τ − θ 2 (Um − Uτ ) + ψ(θ 2 )(m − τ )), where θ i = (µi − r + 12 σ 2 )/σ 2 , i = 1, 2, and ψ(θ) = (r − 12 σ 2 )tθ + 12 σ 2 tθ 2 . This follows from algebraic simplification of the product of the ratios of the densities of the X i under the original and new means. It remains to choose µ1 , µ2 . Intuitively, most of the variability in L µ1 ,µ2 comes from τ (the time of the barrier crossing): for large b, c, in the event of a payout we expect to have Uτ ≈ −b and Um ≈ c so these terms should contribute less variability. If we choose µ1 , µ2 so that ψ(θ 1 ) = ψ(θ 2 ), the likelihood ratio simplifies to L µ1 ,µ2 = exp(−(θ 1 − θ 2 )Uτ − θ 2Um + mψ(θ 2 )), which depends on τ only through Uτ ≈ −b. The condition ψ(θ 1 ) = ψ(θ 2 ) translates to µ1 = −µ2 ≡ −µ, so it only remains to choose this drift parameter. We choose it so that the time to traverse the straight line path from 0 to −b and then to c at rate µ equals the number of steps m: b (b + c) + = m; µt µt i.e., µ = (2b + c)/T . Interestingly, this change of drift does not depend on the original mean increment (r − 12 σ 2 )t. Table 4 illustrates the performance of this method. The computational effort with and without importance sampling is essentially the same, so the efficiency improvement is just the ratio of the variances. The improvement varies widely but shows the potential for dramatic gains from importance sampling, particularly when the barrier is far from the current price of the underlying.11 11 The standard errors in the table are all quite small, but so are the associated option values. Hence, the relative

error without importance sampling is quite significant.

204

P. Boyle, M. Broadie and P. Glasserman

Table 4. Standard errors for down-and-in calls: importance sampling. H

K

No variance reduction

Importance sampling

Efficiency ratio

92 92 88 85

100 105 96 90

0.003 09 0.001 29 0.001 10 0.000 84

0.000 69 0.000 14 0.000 11 0.000 08

20 85 96 116

92 85 75 75

105 105 96 85

0.014 18 0.003 28 0.000 30 0.001 48

0.005 41 0.000 38 0.000 01 0.000 10

7 75 1124 222

All results are based on n = 100 000 simulation trials. The parameters are: S0 = 95, σ = 0.15, and r = 0.05, with the barrier H and strike K varying as indicated. The first four cases have T = 0.25 and m = 50; the last four have T = 1 and m = 250.

In recent work, Andersen and Brotherton-Ratcliffe (1996) and Beaglehole, Dybvig, Zhou (1997) show how to eliminate the bias caused by using a simulation at a discrete set of times to price continuous options on extrema, e.g., barrier or lookback options.

2.8 Conditional Monte Carlo This approach to efficiency improvement exploits the variance reducing property of conditional expectation: for any random variables X and Y , Var[E[X |Y ]] ≤ Var[X ], with strict inequality except in trivial cases.12 In replacing an estimator by its conditional expectation we reduce variance essentially because we are doing part of the integration analytically and leaving less to be done by Monte Carlo. Hull and White (1987) use this idea to price options with stochastic volatilities. Consider a model in which an asset price and its volatility evolve as follows: d S = r S dt + ν S dW1 dν 2 = αν 2 dt + ξ ν 2 dW2 , with W1 , W2 independent. Suppose we want to price a standard European call on S. A straightforward approach simulates sample paths of ν and S up to time T and averages max{ST − K , 0} over all paths. An alternative notes that, conditional on the path of ν t in [0, T ], the asset price St may be treated as having a time-varying 12 This is a direct consequence of Jensen’s inequality for conditional expectations.

6. Monte Carlo Methods for Security Pricing

205

but deterministic volatility. Thus, conditional on the volatility path, the option can be priced by the Black–Scholes formula: e−r T E[max{ST − K , 0}|ν t , 0 ≤ t ≤ T ] = BS(S0 , K , r, T, VT ), where VT =

1 T

T

ν 2t dt

0

is the average squared volatility over the path, and BS(S, K , T, r, σ ) is the Black– Scholes price of a call with constant volatility σ and the other parameters as indicated. Using this conditional expectation as the estimator is sure to reduce variance and may even reduce computational effort since it obviates simulation of S. It is worth emphasizing that both straightforward Monte Carlo and conditional Monte Carlo would have to be applied to discrete-time approximations of the continuous processes above. Also, the applicability of conditional Monte Carlo in this setting relies on the fact that the evolution of the asset price does not influence the volatility path. See Willard (1997) for an extension to the case of correlated W1 and W2 . As a further illustration of the use of conditional Monte Carlo, we give a new illustration in the pricing of a down-and-in call with a discretely monitored barrier. Let 0 = t0 < t1 < · · · < tm = T be the monitoring instants and Sti the price of the underlying at the i th such instant. The option price is E[e−r T max{ST − K , 0}1{τ H ≤T } ], where H is the barrier and τ H is the first monitoring time at which the barrier is breached. Straightforward simulation generates paths of the underlying and evaluates the estimator e−r T max{ST − K , 0}1{τ H ≤T } . Our first alternative conditions on {S0 , . . . , Sτ H }, the path of the underlying until the barrier crossing; i.e., E[e−r T max{ST − K , 0}1{τ H ≤T } ] = e−r T E[E[max{ST − K , 0}1{τ H ≤T } |S0 , . . . , Sτ H ]] = e−r T E[BS(Sτ H , K , r, T − τ H , σ )1{τ H ≤T } ]. This yields the estimator CMC1 = e−r T BS(Sτ H , K , r, T − τ H , σ )1{τ H ≤T } This says: simulate until the barrier is crossed or the option expires; if the barrier was crossed, return the Black–Scholes price starting from price Sτ H with maturity T − τ H.

206

P. Boyle, M. Broadie and P. Glasserman

Our second alternative conditions one step earlier, at each monitoring instant evaluating the probability that the barrier will be breached for the first time at the next monitoring instant: ! m 1{τ H =tn } E[e−r T max(ST − K , 0)1{τ H ≤T } ] = e−r T E max{ST − K , 0} n=1

=e

−r T

E

m

E[max{ST − K , 0}1{τ H =tn } |St0 , . . . , Stn−1 ]

n=1

=e

−r T

E

!

τ H −1

! BS2(Stn , K , H, r, tn+1 − tn , T − tn , σ )

n=0

where BS2(S, K , H, r, t, T, σ ) is the price of a down-and-in call that knocks in only if the underlying is below H at time t. We thus arrive at the estimator CMC2 = e−r T

τ H −1

BS2(Stn , K , H, r, tn+1 − tn , T − tn , σ ),

n=0

with BS2(S, K , H, r, t, T, σ ) = S N2 (a1 , b1 , ρ) − e−r T K N2 (a2 , b2 , ρ) √ where ρ = − t/T , N2 is the bivariate cumulative normal distribution with correlation ρ, and a1 =

log(S/K ) + (r + 12 σ 2 )T , √ σ T

√ a 2 = a1 − σ T

b1 =

log(H/S) − (r + 12 σ 2 )t , √ σ t

√ b2 = b1 + σ t.

(The derivation of this formula is fairly standard and therefore omitted.) The CMC2 estimator can be expected to have lower variance than the CMC1 estimator because it conditions on less information and thus does more integration analytically. In fact, CMC2 is not a conditional Monte Carlo estimator in the strict sense because it conditions on different information at different times, making it more precisely a filtered Monte Carlo estimator in the sense of Glasserman (1996). Because the two estimators above have the same expectation, their difference has mean 0 and can be used as a control variate to form a further estimator CMC = CMC1 + β(CMC2 − CMC1 ). With β optimized, this has lower variance than either individual estimator. Numerical results appear in Table 5. As expected, each level of conditioning further reduces variance, and the combined estimator achieves the lowest standard

6. Monte Carlo Methods for Security Pricing

207

Table 5. Comparison of CMC estimators for down-and-in call. Method

Standard Error (s)

Computation Time (t)

√ s t

Base CMC1 CMC2 CMC

0.108 0.034 0.021 0.014

0.133 0.117 3.233 3.367

0.039 0.012 0.038 0.026

Results based on n = 10 000 replications with σ = 0.4, r = 0.10, S0 = K = 100, H = 95, T = 0.5, and 10 equally spaced monitoring times.

error of all. However, repeated evaluation of the function BS2 turns out to be time-consuming, making CMC1 overall the most efficient estimator.

3 Low-discrepancy sequences For complex problems the performance of the basic Monte Carlo approach may be √ rather unsatisfactory because the error is O(1/ n). We can sometimes improve convergence by using pre-selected deterministic points to evaluate the integral. The accuracy of this approach depends on the extent to which these deterministic points are evenly dispersed throughout the domain of integration. Discrepancy measures the extent to which the points are evenly dispersed throughout a region: the more evenly dispersed the points are the lower the discrepancy. Low-discrepancy sequences are often called quasi-random sequences even though they are not at all random.13 We shall use both terms in this paper. Low-discrepancy methods have recently been used to tackle a number of problems in finance. These applications are more fully described in papers by Birge (1994), Joy, Boyle, and Tan (1996) and Paskov and Traub (1995); the use of quasi-Monte Carlo is also proposed in Cheyette (1992). In this section we describe how the approach works and review some of the recent applications. The book by Press et al. (1992) provides an intuitive introduction to low-discrepancy sequences and quasi-Monte Carlo methods. Spanier and Maize (1994) provide a recent overview of quasi-random methods and how they can be used to evaluate integrals with medium sized samples. Niederreiter (1992) and Tezuka (1995) provide in-depth analyses of low-discrepancy sequences. Moskowitz and Caflisch (1996) discuss recent developments in improving the convergence of quasi-random Monte Carlo methods. In earlier work, Haselgrove (1961) describes a method for multi13 Thus the name quasi-random is very misleading since these sequences are deterministic. However, it seems

to be sanctioned by usage.

208

P. Boyle, M. Broadie and P. Glasserman

variate integration that can be applied to security pricing. Haselgrove’s method is developed for problems of eight dimensions or less and our numerical experiments suggest that it is competitive with the low-discrepancy sequences investigated in this section for problems of this size. The basic idea behind the approach is quite intuitive and is readily explained in the one-dimensional case. Suppose we wish to integrate a function f (x) over the interval [0, 1] using a sequence of n points. Rather than pick a random sequence suppose we pick a deterministic sequence of points that are, in some sense, evenly distributed. With this choice, the accuracy of the estimate will be higher than that obtained using the crude Monte Carlo approach. If we use an equally spaced grid we obtain the trapezoidal method of numerical integration which has an error of O(n −1 ). However, the more challenging task is to evaluate multi-dimensional integrals. Without loss of generality we can assume that the domain of integration is contained in the d-dimensional unit hypercube. The advantages of the uniformly spaced grid in the one-dimensional case do not carry over to higher dimensions. The principal reason is that the error bound for the d-dimensional trapezoidal rule is O(n −2/d ). In addition, if we use an evenly spaced Cartesian grid, we would have to decide the number of points in advance to achieve uniformity. This is restrictive because, in numerical applications, we would like to be able to add points sequentially until some termination criterion is met. Low-discrepancy sequences have the property that as successive points are added the entire sequence of points still remains more or less evenly dispersed throughout the region. Niederreiter (1992) gives a detailed analysis of the discrepancy of a sequence. Here, we just briefly recall the definition. Suppose we have a sequence of n points {x 1 , x2 , . . . , x n } in the d-dimensional half-open unit cube, I d = [0, 1)d and a subset J of I d . We define D(J ; n) =

A(J ; n) − V (J ), n

where A(J ; n) is the number of k, 1 ≤ k ≤ n, with xk ∈ J and V (J ) is the volume of J . The discrepancy, Dn , of the sequence is defined to be the supremum of |D(J ; n)| over all J . The star discrepancy Dn∗ , is obtained by taking the supremum over sets J of the form d 0

[0, u i ).

i=1

In the one-dimensional case there is a simple explicit form for the (star)14 discrepancy of a sequence of n points. If we label the points so that, 0 ≤ x 1 ≤ · · · ≤ 14 For the rest of the paper we simply use the term discrepancy rather than star discrepancy to refer to D ∗ . n

6. Monte Carlo Methods for Security Pricing

209

xn ≤ 1, then the discrepancy of this sequence is 1 2k − 1 ∗ + max xk − . Dn = 2n k=1,...,n 2n We can see that the star discrepancy is at least 1/(2n) and that the lowest value is attained when 2k − 1 , 1 ≤ k ≤ n. xk = 2n In higher dimensions there is no simple form for the discrepancy of a sequence. There are several examples of low-discrepancy sequences, including the sequences proposed by Halton (1960), Sobol’ (1967), Faure (1982), and Niederreiter (1988).15 For these sequences the asymptotic form of the star discrepancy has been shown to be (log n)d ∗ . Dn = O n This bound for the discrepancy involves a constant which in general depends on the dimension d of the sequence. These constants are very difficult to estimate accurately in high dimensions. For large values of d the constants “are often ridiculously large for reasonable values of n” according to Spanier and Maize (1994, p. 23). Furthermore for high dimensions it may take a long time before the discrepancy reaches its asymptotic level. Morokoff and Caflisch (1995) note √ that for intermediate values of n the discrepancy may be O( n). They suggest that the transition to O(n −1 (log n)d ) occurs at around values of n = ed . For large d this will be an enormous number. The error in numerical integration using a low-discrepancy sequence admits a deterministic bound. The bound reflects both the discrepancy of the sequence of points used to evaluate the integral as well as the regularity of the function. The result is contained in the following theorem. Theorem (Koksma–Hlawka) Let I d = [0, 1)d and let f have bounded variation V ( f ) on [0, 1]d in the Hardy–Krause16 sense. Then for any x1 , x 2 , . . . , xn ∈ I d we have n 1 f (x k ) − f (u) du ≤ V ( f )Dn∗ . n Id k=1 15 Interestingly, linear congruential generators – frequently used to generate the pseudo-random numbers that

drive ordinary Monte Carlo – produce sets of points with low-discrepancy over the entire period of the generator; see Niederreiter (1976). This suggests the possibility of choosing such a generator with period roughly equal to the total number of points required as a type of quasi-Monte Carlo method. In ordinary Monte Carlo, one prefers instead that the period be many orders of magnitude larger than the number of points required. We thank Peter Hellekalek of the University of Salzburg for this observation. 16 For a more complete discussion of the Hardy–Krause definition of variation and details on this theorem see Niederreiter (1992).

210

P. Boyle, M. Broadie and P. Glasserman

The error bound provided by this theorem, while it is of theoretical interest, is of little help in most practical situations. The theoretical bound normally overestimates the actual error by a wide margin and V ( f ) may be difficult to evaluate or even approximate. We have noted that the constants buried in the bounds for the discrepancy are large. Another reason for the coarseness of the bound is that the Koksma–Hlawka theorem does not reflect additional smoothness in f . Intuitively we would expect the approximation to be better as f becomes smoother. In finance applications the payoffs are normally continuous functions of the variables (with some important exceptions – payoffs on digital and barrier options are discontinuous), but may not be sufficiently smooth to have finite variation because of functions like “max” embedded in the payoffs. Hlawka (1971) provides an alternative bound under weaker smoothness requirements. To date, studies using low-discrepancy sequences in finance applications find that the errors produced are substantially lower than the corresponding errors generated by crude Monte Carlo. Joy, Boyle, and Tan (1996) used Faure sequences to price several complex derivative securities. They found that the quasi-Monte Carlo approach resulted in significantly smaller errors than the standard Monte Carlo approach. They confirmed that the actual error bound (for cases in which it could be computed precisely) was dramatically less than the bound computed from the Koksma–Hlawka inequality. Paskov and Traub (1995) used both Sobol’ sequences and Halton sequences to evaluate mortgage-backed security prices. Their work involves the evaluation of integrals with dimensions up to 360; they find that Sobol’ sequences are more efficient than Halton sequences and that the quasi-random approach outperforms the standard Monte Carlo approach for these types of problems.17 Paskov and Traub’s results stand in contrast to the claim that is sometimes found in the literature18 that the superiority of low-discrepancy algorithms vanishes for intermediate values of d around 30. Bratley, Fox, and Niederreiter (1992) conducted practical numerical experiments using low-discrepancy sequences and conclude that standard Monte Carlo is superior to quasi-Monte Carlo for high dimensions, say greater than 12. They used Sobol’ and Niederreiter sequences in their tests. They conclude that in high dimensions, “quasi-Monte Carlo seems to offer no practical advantage over pseudo-Monte Carlo because the discrepancy √ bound for the former is far larger than n for n = 230 , say.” (In a personal communication, Fox adds that the crossover probably depends a lot on the sequence.) The reason for the difference between this verdict and the results of the finance applications may be that the integrands typically found in finance applica17 Bratley et al. (1992) note that the Niederreiter sequence they tested theoretically beats Sobol’ sequences in

dimensions higher than seven. 18 See, for example, Rensburg and Torrie (1993) or Morokoff and Caflisch (1995).

6. Monte Carlo Methods for Security Pricing

211

tions behave better than those used by numerical analysts19 to compare different algorithms. Another important consideration is that financial applications typically involve discounting, and this may effectively reduce dimensionality; for example, some of the 360 months in the life of a mortgage may have little influence on the value of a mortgage-backed security. Nevertheless, the experience of Bratley et al. (1992) serves as a useful caution against assuming that quasi-Monte Carlo will outperform standard Monte Carlo in all situations. Some theoretical differences among low-discrepancy sequences can be understood through the concepts of (t, m, s)-nets and (t, s)-sequences; these are discussed in detail in Niederreiter (1992). Briefly, an elementary interval in base b in dimension s is a set of the form s 0 aj aj + 1 , , bk j bk j j=1 with k j , a j nonnegative integers and a j < bk j . A (t, m, s)-net (with 0 ≤ t ≤ m) is a set of bm points in the s-dimensional hypercube such that every elementary interval of volume bt−m contains bt points. Speaking loosely, this means that the proportion of points in each sufficiently large box equals the volume of the box. Smaller t implies greater uniformity. An infinite sequence forms a (t, s)-sequence if for all m ≥ t certain finite subsequences of length bm form (t, m, s)-nets in base b. Sobol’ points are (t, s)-sequences in base 2 and Faure points are (0, s) sequences in prime bases not less than s. Thus, Faure points achieve the smallest value of t, but at the expense of a large base. A smaller base implies that uniformity holds over shorter subsequences. An important issue in the use of quasi-Monte Carlo concerns the termination criterion, since the Koksma–Hlawka bound is often of little practical value. Various heuristics are available. Birge (1994) suggests that a rough bound may be obtained by tracking the maximum and minimum values over a period that shows equal numbers of increases and decreases. For instance the criterion could be to stop at the first set of two thousand observations in which the number of increases and decreases are within ten percent of each other. He suggests that the maximum and minimum realized values could be used as bounds on the true value. Fox (1986) suggests that we compare the estimate of the integral based on a sample of 2n points with the estimate based on n points and stop if the answer lies within some tolerance level. Paskov and Traub (1995) use a similar termination criterion based 19 For example, one of the integrals used by Bratley, Fox, and Niederreiter (1992) was

1 0

···

10 d 0 k=1

k cos(kxk )d x1 · · · d xd .

This integrand is highly periodic for large values of d.

212

P. Boyle, M. Broadie and P. Glasserman

on successive errors: stop when the difference between two consecutive approximations using 10 000i, i = 1, 2, . . . , 1000, sample points falls below some threshold. Owen (1995a, 1995b) proposes a hybrid of Monte Carlo and low-discrepancy methods which provides error estimates and has good convergence properties. In addition to these approaches, one can also run standard Monte Carlo at the outset and use the probabilistic error term to assess when enough low-discrepancy points have been used in the quasi-random calculation. This benchmarking with standard Monte Carlo would be useful if the same set of calculations were being carried out frequently with only slightly different input values. This situation is common in finance applications. There is often a need to perform the same set of calculations frequently; e.g., the risk analysis of a book of business at the end of each day. In these cases one can conduct experiments to see which sets of low-discrepancy sequences provide the best results. The right number of low-discrepancy points could be determined just once at the outset. Before leaving this section, we should mention some recent advances and new techniques to improve the performance of quasi-random Monte Carlo. Niederreiter and Xing (1996), Tezuka (1994), and Ninomiya and Tezuka (1996) have proposed new low-discrepancy sequences that appear to have the potential to perform substantially better than previous methods. We have noted that the efficiency of quasirandom Monte Carlo improves as the integrand becomes smoother. Moskowitz and Caflisch (1996) illustrate procedures that can be used for this purpose. It is sometimes possible to enhance the performance of quasi-random sequences by reducing the effective dimension of the problem. Moskowitz and Caflisch also indicate how this can be accomplished in the discretization of a Wiener process and in the solution of the Feynman–Kac equation. This is relevant for finance applications since the prices of derivative securities have a Feynman–Kac representation. See Acworth, Broadie, and Glasserman (1997), Berman (1996), and Caflisch, Morokoff, and Owen (1998) for recent work applying low-discrepancy sequences with alternative constructions of Wiener processes. Spanier and Maize (1994) discuss a battery of techniques that can be used to improve the performance of quasi-Monte Carlo methods for relatively small sample sizes. Next we compare the Monte Carlo method using pseudo-random numbers with the Faure, Halton, and Sobol’ low-discrepancy methods.

3.1 Numerical results For an initial comparison, we test the methods on the problem of pricing a European option on a single underlying asset with the usual Black–Scholes assumptions. In this framework, the Black–Scholes formula can be evaluated to give the true option values in order to compare alternative methods. Rather than using

6. Monte Carlo Methods for Security Pricing

213

a single option, we evaluate the methods on a random sample of 500 options. The probability distribution of the parameters is chosen to represent a reasonable range of values in practical applications.20 The error measure that we use is root-mean-squared (RMS) relative error defined by 7 8 m ˆ 81 Ci − Ci 2 9 , (12) RMS = m i=1 Ci where i is the index of the m = 500 options in the test set, Ci is the true option value, and Cˆ i is the estimated option value. The results are given in Figure 1. Figure 1 plots RMS relative error against the number of points, n. The Monte Carlo method (i.e., using pseudo-random numbers) displays the expected √ O(1/ n) convergence: e.g., increasing n by a factor of 100 decreases the RMS error by a factor of 10. The low-discrepancy method using Faure sequences dominates the Monte Carlo method. Indeed, 129 Faure points gives an error lower than 1000 Monte Carlo points. The Sobol’ method is the best of the three methods tested. Using 192 Sobol’ points gives an error lower than 10 000 Monte Carlo points. A major consideration in the comparison of methods is the overall computation time, not just the number of points. The Sobol’ sequence numbers can be generated significantly faster than Faure numbers (see, e.g., Bratley and Fox 1988) and as fast as most pseudo-random number methods. Hence, in the important RMS error versus computation time comparison, the relative advantage of the Sobol’ method increases. A low-discrepancy sequence will often have additional uniformity properties at certain points in the sequence (see, e.g., Fox 1986 and Bratley and Fox 1988). For example, in the Sobol’ sequence the running average returns to 0.5 at the points n = 2k − 1 for k = 1, 2, . . .. One might expect that choosing n to be one of these “favorable” points would lead to better option price estimates. For large values of n, the advantage of using favorable points becomes negligible, but for small n the effect can be quite significant. Indeed, in the experiment above, using the Sobol’ points 1 through 254 gives an RMS error of 10%, while using the points 1 through 255 gives an RMS error of 4%.21 Better results are often obtained by ignoring an initial portion of a low-discrepancy sequence. For example, using the Sobol’ points 1 through 63 gives an RMS error of 13%, while using the Sobol’ points 64 through 127 gives an RMS error of 2%. In the results in Figure 1, the Sobol’ sequence was always started at point 64, so the label 192 in Figure 1 corresponds to the 192 Sobol’ points from 64 to 255. Similarly, the Faure sequence was always started at 20 The details of the distribution are given in Broadie and Detemple (1996). 21 We take the first point of the Sobol’ sequence to be 0.5, not 0.0.

214

P. Boyle, M. Broadie and P. Glasserman 10 0

+

10 -1

Monte Carlo +

129

RMS Relative Error

x

Faure 10 -2

+

1,137

192*

x

Sobol

65,000 +

960* 9,201 x

8,128*

10 -3

61,425 x

65,472*

10 -4 10 2

10 3

10 4

10 5

n

Fig. 1. RMS relative error vs. number of points.

point 16, so the label 129 in Figure 1 corresponds to the 129 Sobol’ points from 16 to 144.

3.2 One-dimensional vs. higher dimensional sequences It is sometimes asserted that low-discrepancy methods can be implemented in existing simulation programs by simply replacing the pseudo-random number generator with a low-discrepancy sequence generator. This naive approach can lead to disastrous results as the following example shows. Consider pricing a European option on the maximum of two non-dividend paying assets with the parameters: S1 = S2 = K = 100, σ 1 = σ 2 = 0.2, ρ = 0.3, r = 0.05, and T = 1. Under the usual Black–Scholes assumptions, a formula for the price of the option can be derived (see, e.g., Johnson 1987 or Stulz 1982) and gives a price of 16.442. Running one Monte Carlo simulation with 1000 points (hence 2000 random numbers) gave an estimated price of 16.279 with a standard error of 0.533. Using 2000 one-dimensional low-discrepancy values gave a price estimate of 4.320 using the Sobol’ sequence and an estimate of 1.909 using the

6. Monte Carlo Methods for Security Pricing

215

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Fig. 2. 1000 two-dimensional Faure points.

Faure sequence (starting at point 16). The cause of the problem can be seen by examining Figures 2–5. Figures 2 and 3 show 1000 two-dimensional Faure and Sobol’ points, respectively. The figures illustrate how the sequences fill the two-dimensional space in regular but different ways. By contrast, Figures 4 and 5 show 2000 onedimensional Faure and Sobol’ points, respectively, plotted in two dimensions. The plots are created by taking successive points in the one-dimensional sequence to be the (x, y) coordinates in two-dimensional space. In neither figure are the points filling the two-dimensional space (note that the axes do not extend from 0 to 1) and this explains why the price estimates do not converge to the correct values. Even in the quarter of the unit square where the points fall, the points do not uniformly fill the space. This problem is reminiscent of the well-known “collinearity” or “hyperplane” problem of some pseudo-random number generators, but is even more serious with these low-discrepancy sequences. A similar problem can occur if a high-dimensional low-discrepancy sequence is used for a problem of low dimension. Figure 6 shows the 49th and 50th dimension of 1000 50-dimensional Faure points. Using the last two dimensions of the 50dimensional sequence to price a two-dimensional option will give very poor results.

3.3 Higher dimensional test To test the effect of problem dimension, we price options in dimensions d = 10, 50, and 100. We price discretely sampled geometric average Asian options, because the problem dimension is easily varied and a closed form solution for the price

216

P. Boyle, M. Broadie and P. Glasserman 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Fig. 3. 1000 two-dimensional Sobol’ points. 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

Fig. 4. 2000 one-dimensional Faure points.

is available (see Turnbull and Wakeman 1991). The price of a geometric average Asian option is given by C = E[e−r T ( S˜ − K )+ ],

1d where S˜ = ( i=1 Si )1/d and Si is the asset price at time i T /d. We test standard Monte Carlo, Monte Carlo with antithetic variates, and the low-discrepancy sequences of Faure, Sobol’, and Halton.22 For each dimension, we select 500 option parameters at random, and compute RMS relative error (see 22 We thank Spassimir Paskov and Joseph Traub for providing their code for the Sobol’ sequences.

6. Monte Carlo Methods for Security Pricing

217

0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

Fig. 5. 2000 one-dimensional Sobol’ points. 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

Fig. 6. Coordinates 49 and 50 of 1000 50-dimensional Faure points.

equation 12) for each method.23 Results for 50 000 and 200 000 sample points are given in Figures 7 and 8, respectively. (The antithetic method uses 25 000 and 100 000 independent pairs of points, respectively.) Results for the Halton sequence were not competitive and are suppressed. RMS error for standard Monte Carlo is nearly independent of the problem dimension. The antithetic method gives minimal variance reduction. The relative advantage, in terms of RMS error, of the low-discrepancy sequences decreases with the problem dimension. For this test problem, the crossover point is beyond dimension 100. 23 The details of the distribution are given in Broadie and Detemple (1996).

218

P. Boyle, M. Broadie and P. Glasserman

1.1

RMS Relative Error (in percent)

1.0 0.9 Monte Carlo

0.8 0.7

Antithetic

0.6 0.5 Faure

0.4 0.3 0.2

Sobol’

0.1 0.0 10

20

30

40

50 60 Dimension

70

80

90

100

90

100

Fig. 7. Results with 50 000 points.

0.45 Monte Carlo

RMS Relative Error (in percent)

0.40 0.35 0.30

Antithetic

0.25 0.20 Faure

0.15 0.10

Sobol’

0.05 0.00 10

20

30

40

50 60 Dimension

70

80

Fig. 8. Results with 200 000 points.

4 Estimating price sensitivities Most of the discussion in this paper centers on the use of Monte Carlo for pricing securities. In practice, the evaluation of price sensitivities is often as important as the evaluation of the prices themselves. Indeed, whereas prices for some securities

6. Monte Carlo Methods for Security Pricing

219

can be observed in the market, their sensitivities to parameter changes typically cannot and must therefore be computed. Since price sensitivities are important measures of risk, the growing emphasis on risk management systems suggests a greater need for their efficient computation. The derivatives of a derivative security’s price with respect to various model parameters are collectively referred to as Greeks, because several of these are commonly referred to with the names of Greek letters.24 Perhaps the most important of these – and the one to which we give primary attention – is delta: the derivative of the price of a contingent claim with respect to the current price of an underlying asset. The delta of a stock option, for example, is the derivative of the option price with respect to the current stock price. An option involving multiple underlying assets has multiple deltas, one for each underlying asset. In the rest of this section, we discuss various approaches to estimating price sensitivities, especially delta. We begin by examining finite-difference approximations and show that these can be improved through the use of common random numbers. We then discuss direct methods that estimate derivatives without requiring resimulation at perturbed parameter values.

4.1 Finite-difference approximations Consider the problem of computing the delta of the Black–Scholes price of a European call; i.e., computing dC , = d S0 where C is the option price and S0 is the current stock price. There is, of course, an explicit expression for delta, so simulation is not required, but the example is useful for purposes of illustration. A crude estimate of delta is obtained by generating a terminal stock price ST = S0 e(r − 2 σ 1

2 )T +σ

√

TZ

(13)

(see (2) for notation) from the current stock price S0 and a second, independent terminal stock price 1

ST (!) = (S0 + !)e(r − 2 σ

2 )T +σ

√

T Z

(14)

from the perturbed initial price S0 + !, with Z and Z independent. For each terminal price, a discounted payoff can be computed like this: ˆ 0 ) = e−r T max{0, ST − K }, C(S

ˆ 0 + !) = e−r T max{0, ST (!) − K } C(S

24 See, e.g., Chapter 13 of Hull (2000) for background.

220

P. Boyle, M. Broadie and P. Glasserman

(see (3) for notation). A crude estimate of delta is then provided by the finitedifference approximation ˆ 0 + !) − C(S ˆ 0 )]. ˜ = ! −1 [C(S

(15)

By generating n independent replications of ST and ST (!) we can calculate the ˜ As n → ∞, this sample mean sample mean of n independent copies of . converges to the true finite-difference ratio ! −1 [C(S0 + !) − C(S0 )],

(16)

where C(·) is the option price as a function of the current stock price. This discussion suggests that to get an accurate estimate of we should make ! small. However, because we generated ST and ST (!) independently of each other, we have ˆ 0 + !) + Var[C(S0 )]) = O(! −2 ), ˜ = ! −2 (Var[C(S Var[] ˜ becomes very large if we make ! small. To get an estimator so the variance of that converges to we must let ! decrease slowly as n increases, resulting in slow overall convergence. A general result of Glynn (1989) shows that the best possible convergence rate using this approach is typically n −1/4 . Replacing the forward ˆ 0 + !) − C(S ˆ 0− difference estimator in (15) with the central difference (2!)−1 [C(S !)] typically improves the optimal convergence rate to n −1/3 . These rates should be compared with n −1/2 , the rate ordinarily expected from Monte Carlo. Better estimators can generally be improved using the method of common random numbers, which, in this context, simply uses the same Z in (13) and (14). ˆ the finite-difference approximation thus obtained. For fixed !, the Denote by ˆ also converges to (16). The variance sample mean of independent replications of parameter is given by ˆ 0 )] + Var[C(S ˆ 0 + !)] − 2 Cov[C(S ˆ 0 ), C(S ˆ 0 + !)]), ˆ = ! −2 (Var[C(S Var[] ˆ 0 + !) are no longer independent. Indeed, if they are ˆ 0 ) and C(S because C(S ˆ has smaller variance than . ˜ That they are in fact positively correlated, then positively correlated follows from the monotonicity of the function mapping Z to Cˆ by the argument used in our discussion of antithetics in Section 3. Thus, the use of common random numbers reduces the variance of the estimate of delta. The impact of this variance reduction is most dramatic when ! is small. A simple calculation shows that, using common random numbers, ˆ 0 )| ≤ |ST (!) − ST | ˆ 0 + !) − C(S |C(S 1

≤ !e(r − 2 σ

2 )T +σ

√

TZ

.

6. Monte Carlo Methods for Security Pricing

221

Because this upper bound has finite second moment, we may conclude that ˆ 0 )|2 ] = O(! 2 ), ˆ 0 + !) − C(S E[|C(S

(17)

and therefore that ˆ 0 + !) − C(S ˆ 0 )}] = O(1); Var[! −1 {C(S ˆ remains bounded as ! → 0, whereas we saw previously i.e., the variance of ˜ increases at rate ! −2 . Thus, the more precisely we try that the variance of to estimate (by making ! small) the greater the benefit of common random numbers. Moreover, this indicates that to get an estimator that converges to ˜ resulting we may let ! decrease faster as n increases than was possible with , in faster overall convergence. An application of Proposition 2 of L’Ecuyer and Perron (1994) shows that a convergence rate of n −1/2 can be achieved in this case, and that is the best that can ordinarily be expected from Monte Carlo. For more on convergence rates using common random numbers see Glasserman and Yao (1992), Glynn (1989), and L’Ecuyer and Perron (1994). The dramatic success of common random numbers in this example relies on the ˆ 0 + !) to C(S ˆ 0 ) evidenced by (17). fast rate of mean-square convergence of C(S This rate does not apply in all cases. It fails to hold, for example, in the case of a digital option25 paying a fixed amount B if ST > K and 0 otherwise. The price of this option is C = e−r T B P(ST > K ); the obvious simulation estimator is ˆ 0 ) = 1{ST >K } e−r T B. C(S ˆ 0 + !) differ only when ST ≤ K < ST (!), we have ˆ 0 ) and C(S Because C(S ˆ 0 + !) − C(S ˆ 0 )|2 ] = B 2 e−2r T P(ST ≤ K < ST (!)) E[|C(S = B 2 e−2r T P(ST ≤ K < (1 + !/S0 )ST ) = O(!), compared with O(! 2 ) for a standard call. As a result, delta estimation is more difficult for the digital option, and a similar argument applies to barrier options generally. Even in these cases, the use of common random numbers can result in substantial improvement compared with differences based on independent runs. Table 6 compares the performance of four types of delta estimates: forward and central finite-differences with and without common random numbers. The methods are compared at four values of the perturbation parameter !, and applied to the two options discussed above. The values in the table are estimated root mean square errors. The numerical results substantiate the analysis above. Much lower errors are obtained for the standard call than for the digital option, allowing for smaller !; central differences beat forward differences; common random numbers helps, but 25 Also called a “binary” or “cash-or-nothing” option; see Hull (2000, p. 464).

222

P. Boyle, M. Broadie and P. Glasserman

Table 6. RMS errors for various delta estimation methods. !

Independent Forward Central

Common Forward Central

Standard Call Option

10 1 0.1 0.01

0.10 0.18 1.78 7.47

0.01 0.09 0.87 8.98

0.100 0.012 0.006 0.006

0.009 0.006 0.006 0.006

Digital Option

20 10 5 1

0.51 0.22 0.16 0.67

0.37 0.11 0.07 0.34

0.51 0.21 0.11 0.14

0.37 0.10 0.05 0.10

Root mean square error of delta estimates for two options using four methods with various values of !. Both options have S0 = 100, K = 100, σ = 0.40, r = 0.10, and T = 0.2. The digital option has B = 100. Each entry is computed from 1000 delta estimates, each estimate based on 10 000 replications. The value of delta is 0.580 for the first option and 2.185 for the second.

it helps the standard call more than the digital option. In several cases, the minimal error is obtained using a fairly large !. This reflects the fact that the bias resulting from a large ! is sometimes overwhelmed by the large variance resulting from a small !. Although we have discussed common random numbers in only a limited context, it can easily be applied to a wide range of problems. If all stochastic inputs to a simulation are samples from the normal distribution, then common random numbers can be implemented by using the same samples at two different parameter settings. More generally, if the stochastic inputs are all drawn from a sequence of uniform random variates, then common random numbers can be implemented by using these variates at two different parameter settings.

4.2 Direct estimates Even with the improvements in performance obtained from common random numbers, derivative estimates based on finite differences still suffer from two shortcomings. They are biased (since they compute difference ratios rather than derivatives) and they require multiple resimulations: estimating sensitivities to d parameter changes requires repeatedly running one simulation with all parameters at their base values and d additional simulations with each of the parameters perturbed.

6. Monte Carlo Methods for Security Pricing

223

The computation of 10–50 Greeks26 for a single security is not unheard of, and this represents a significant computational burden when multiple resimulations are required. Over the last decade, a variety of direct methods have been developed for estimating derivatives by simulation. Direct methods compute a derivative estimate from a single simulation, and thus do not require resimulation at a perturbed parameter value. Under appropriate conditions, they result in unbiased estimates of the derivatives themselves, rather than of a finite-difference ratio. Our discussion focuses on the use of pathwise derivatives as direct estimates, based on a technique generally called infinitesimal perturbation analysis (see, e.g., Glasserman 1991). The pathwise estimate of the true delta dC/d S0 is the derivative of the sample price Cˆ with respect to S0 . More precisely, it is d Cˆ ˆ 0 + !) − C(S ˆ 0 )], = lim ! −1 [C(S d S0 !→0 ˆ 0 + !) are computed ˆ 0 ) and C(S provided the limit exists with probability 1. If C(S from the same Z , then provided ST = K , we have d Cˆ d Cˆ d ST = d S0 d ST d S0 =e

−r T

(18)

ST 1{ST >K } . S0

We have used (13) to get √ ST d ST 1 2 = e(r − 2 σ )T +σ T Z = , d S0 S0

and

−r T d Cˆ e , −r T d =e max{0, ST − K } = 0, d ST d ST

ST > K ; ST < K .

At ST = K , C fails to be differentiable; however, since this occurs with probability ˆ S0 is almost surely well defined. zero, the random variable d C/d ˆ S0 can be thought of as a limiting case of the The pathwise derivative d C/d common random numbers finite-difference estimator in which we evaluate the limit analytically rather than numerically. It is a direct estimator of the option delta because it can be computed directly from a simulation starting at S0 without the need for a separate simulation at a perturbed value S0 . This is evident from the expression in (18). The question remains whether this estimator is unbiased; that 26 Sensitivities to various changes in the yield curve often account for several of these.

224

P. Boyle, M. Broadie and P. Glasserman

is, whether

d Cˆ E d S0

=

dC d ˆ ≡ E[C]. d S0 d S0

The unbiasedness of the pathwise estimate thus reduces to the interchangeability of derivative and expectation. The interchange is easily justified in this case; see Broadie and Glasserman (1996) for this example and conditions for more general cases. Applying the same reasoning used above, we obtain the following pathwise estimators of three other Greeks for the Black–Scholes price: Rho (dC/dr ): Vega (dC/dσ ): Theta (−dC/dT ):

K T e−r T 1{ST ≥K } ST e−r T 1{ST ≥K } ln(ST /S0 ) − (r − 12 σ 2 )T σ ST ln(ST /S0 ) re−r T max(ST − K , 0) − 1{ST ≥K } e−r T 2T +(r − 12 σ 2 )T .

Each of these estimators is unbiased. Of course, Monte Carlo estimators are not required for these derivatives because closed-form expressions are available for each. The Black–Scholes setting is useful for illustration, but the utility of the technique rests on its applicability to more general models. In Broadie and Glasserman (1996), pathwise estimates are derived and studied (both theoretically and numerically) for Asian options and a model with stochastic volatility. For example, the Asian-option delta estimate is simply e−r T

S¯ 1¯ , S0 { S>K }

where S¯ is the average asset price used to determine the option payoff. Evaluating this expression takes negligible time compared with resimulating to estimate the option price from a perturbed initial stock price. The pathwise estimate is thus both more accurate and faster to compute than the finite-difference approximation. These advantages extend to a wide class of problems. As already noted, the unbiasedness of pathwise derivative estimates depends on an interchange of derivative and expectation. In practice, this generally means that the security payoff should be a pathwise continuous function of the parameter in question. The standard call option payoff e−r T max{0, ST − K } is continuous in each of its parameters. An example where continuity fails is a digital option with payoff e−r T 1{ST >K } B, with B the amount received if the stock finishes in the

6. Monte Carlo Methods for Security Pricing

225

money.27 Because of the discontinuity at ST = K , the pathwise method (in its simplest form) cannot be applied to this type of option. The problem of discontinuities often arises in the estimation of gamma, the second derivative of an option price with respect to the current price of an underlying asset. Consider, again, the standard European call option. We have an expression ˆ S0 is ˆ S0 in (18) involving the indicator 1{ST >K } . This shows that d C/d for d C/d discontinuous in ST , preventing us from differentiating pathwise a second time to get a direct estimator of gamma. To address the problem of discontinuities, Broadie and Glasserman (1996) construct smoothed estimators. These estimators are unbiased, but not as simple to derive and implement as ordinary pathwise estimators. Broadie and Glasserman also investigate another technique for direct derivative estimation called the likelihood ratio method. This method differentiates the probability density of an asset price, rather than the outcome of the asset price itself.28 The domains of this method and the pathwise method overlap, but neither contains the other. When both apply, the pathwise method generally has lower variance. Overviews of these methods can be found in Glasserman (1991), Glynn (1987), and Rubinstein and Shapiro (1993). For discussions specific to financial applications see Broadie and Glasserman (1996) and Fu and Hu (1995).

5 Pricing American options by simulation European contingent claims have cash flows that cannot be influenced by decisions of the owner. Examples include European options, barrier options, and many types of swaps. By contrast, the cash flows of American contingent claims depend both on the price path of the underlying asset or assets and the decisions of the owner. Many types of American contingent claims trade on exchanges and in the overthe-counter market. Examples include American options, American swaptions, shout options, and American Asian options. They also arise in other contexts, for example as “real options” in the theory of economic investment described in Dixit and Pindyck (1994). To be concrete, suppose that we wish to estimate the quantity maxτ E[e−r τ h(Sτ )], where r is the constant riskless interest rate, h(Sτ ) is the payoff at time τ in state Sτ , and the max is taken over all stopping times τ ≤ T . This formulation of the American pricing problem will suffice to illustrate the major points. First, note that the state can be vector-valued and hence 27 We used this example at the end of Section 3. The settings are related: problems for which common random

numbers is particularly effective are generally problems to which the pathwise method can be applied even more effectively. 28 Though not presented in a Monte Carlo context, the expressions in Carr (1993) are potentially relevant to this approach.

226

P. Boyle, M. Broadie and P. Glasserman

applies to pricing American options on multiple assets. Second, since simulation algorithms are discrete in nature, the continuous-time exercise decision must be approximated by restricting the exercise opportunities to lie in a finite set of times 0 = t0 < t1 < · · · < td = T . This is not always a serious restriction. For example, for a call option on a stock which pays dividends at discrete points in time, it can be shown that early exercise is only optimal just prior to the ex-dividend dates. In other cases, Richardson or other extrapolation techniques can be used to better approximate the price with exercise in continuous time from a finite set of exercise opportunities.29 However, we now restrict attention to estimating the quantity P ≡ max E[e−r τ h(Sτ )], τ

(19)

where the max is taken over all stopping times τ in the set ti , for i = 0, . . . , d. The need to estimate an optimal stopping time is the crucial distinction between American and European pricing problems. If the state space is of low dimension, say three or less, a discretization scheme together with a dynamic programming algorithm can often be used to numerically approximate the value in (19). Even in these cases, simulation can be used to estimate the expectation in the recursive step. Simulation-based methods become essential when the dimension of the state space is large. An obvious simulation-based algorithm for estimating the quantity P in equation (19) is to generate a random path of states Sti , for i = 1, . . . , d, and form the path estimate Pˆ = max e−r ti h(Sti ). i=0,...,d

However, this estimator corresponds to using perfect foresight, and so it is biˆ ≥ P, which follows immediately from the inequality ased high. That is, E[ P] −r ti maxi=0,...,d e h(Sti ) ≥ e−r τ h(Sτ ). A natural goal would be to develop an alternative unbiased estimator. A negative result in this regard is provided in Broadie and Glasserman (1997): among a large class of estimators, there is no unbiased estimator of P. In particular, the estimators proposed in Tilley (1993), Grant, Vora, and Weeks (1997), and Barraquand and Martineau (1995) are all biased. Unfortunately, they provide no way to estimate the extent of the bias or to correct for the bias in a general setting. Broadie and Glasserman (1997) circumvent this problem by developing two estimators, one biased high and one biased low (but both asymptotically unbiased), which can be used together to form a valid confidence interval for the quantity P. In the remainder of this section, we give brief descriptions of the four methods mentioned and describe some strengths and weaknesses of each. 29 Geske and Johnson (1984) gave the first financial application of Richardson extrapolation. An extensive

treatment of extrapolation techniques is given in Marchuk and Shaidurov (1983).

6. Monte Carlo Methods for Security Pricing

227

5.1 Tilley’s bundling algorithm Tilley (1993) sparked considerable interest by demonstrating the potential practicality of applying simulation to pricing American contingent claims. Tilley describes a “bundling procedure” for pricing an American option on a single underlying asset. To estimate P he suggests simulating n paths of asset prices denoted Sti ( j) for i = 1, . . . , d and j = 1, . . . , n in the usual way. Next, partition the asset price space and call the paths which fall into a given partition at a fixed time a “bundle.” A dynamic programming algorithm is applied to bundles to estimate C. In particular, the estimated option price Pti ( j) at time ti for path j is the maximum of the immediate exercise value, h(Sti ( j)), and the present value of continuing. The latter value is defined to be the average of e−r (ti+1 −ti ) Pti+1 (k) over all paths k which fall in the bundle containing path j at time ti . Details of the partitioning are given in Tilley (1993). In order to implement the algorithm, all paths must be stored so they can be sorted into bundles at each time step. Since simulation typically requires a large number of paths for good estimates, the storage and sorting requirements can be significant. More importantly, the algorithm does not easily generalize to multiple state variables. In higher dimensions, it is not clear how to define the bundles. Even then it is likely that most partitions will contain very few paths and lead to a large bias, or the partitions will be so large that the continuation values are poorly estimated. Because Tilley’s algorithm uses the same paths to estimate the optimal decisions and the value, the estimator tends to be biased high (although the bundling induces an approximation which is difficult to analyze). Tilley introduces a “sharp boundary” variant which reduces the bias, but this variant does not easily generalize to higher dimensions. Carriere (1996) contains further analysis of Tilley’s algorithm and suggests a procedure based on spline functions to reduce the bias. It remains to be seen whether the spline procedure is practical for higher dimensional problems. Nevertheless, for single state variable problems, Tilley demonstrated the potential practicality of applying simulation to American-style pricing problems. 5.2 Barraquand and Martineau’s stratified state aggregation (SSA) algorithm Barraquand and Martineau (1995) propose a partitioning algorithm, but unlike Tilley’s bundling algorithm, they partition the payoff space instead of the state space. Hence, only a one dimensional space is partitioned at each time step, independent of the number of state variables.30 Their algorithm works as follows. 30 In fact, they distinguish between partitioning the state space, which they term “stratified state aggregation,”

and partitioning the payoff space, which they term “stratified state aggregation along the payoff.” The latter method is the only one that they test or specify in detail. Hence we focus our discussion on this variant of their method.

228

P. Boyle, M. Broadie and P. Glasserman (14, 2)

( S1 , S2 )

1/2

(8, 8) 1/2

1/2

(8, 6)

(2, 14) 1/2

t0

(8, 4)

(4, 2)

t1

t2

t

Fig. 9. State evolution.

First, partition the payoff space into K disjoint cells. Then simulate n paths of asset prices denoted Sti ( j) for i = 1, . . . , d and j = 1, . . . , n in the usual way. For each payoff cell k at time ti , record the number of paths, ati (k), which fall into the cell. For each pair of cells k and l at consecutive times ti and ti+1 , record the number of paths, bti (k, l), which fall into both cells. Also, for each cell k at time h(Sti ( j)), where the sum is ti , record the sum of the payoff values, cti (k) = over all paths j which fall into cell k at time ti . The transition probability from (ti , k) to (ti+1 , l) is approximated by pti (k, l) = bti (k, l)/ati (k). The estimated option price Pti (k) at time ti in cell k is the maximum of the immediate exercise value and the present value of continuing. The immediate exercise value is approximated by cti (k)/ati (k). The present value of continuing is approximated by K pti (k, l)Pti+1 (l). This procedure can be applied backwards in time e−r (ti+1 −ti ) l=1 to determine the simulation estimate of the price P. Details of a payoff space partitioning scheme are given in Barraquand and Martineau (1995). Once a single path is generated and the summary information a, b, and c is recorded, the path can be discarded. Hence the storage requirements with this method are modest: on the order of K 2 d. One drawback of this method is a possible lack of convergence, as the following example illustrates. Figure 9 shows the evolution of two asset prices (S1 , S2 ). The option payoff is h(S1 , S2 ) = max(S1 , S2 ) and for convenience the riskless rate is taken to be zero. Using the risk-neutral probabilities in Figure 9, the true value of the option at time t0 is 11, which at time t1 involves exercise in state (8, 4) but continuing in state (8, 8). When the states are partitioned by their payoffs, these two states are indistinguishable. As seen in the payoff evolution in Figure 10, the best strategy at time t1 in payoff state 8 is to continue. The apparent value of the option in Figure 10 is 9 (= (1/2)14 + (1/2)4). In this example, partitioning the payoff

6. Monte Carlo Methods for Security Pricing

229

h ( S1 , S2 ) 14 1/2

8

8 1/2

4 t0

t1

t2

t

Fig. 10. Payoff evolution.

space leads to a significant underestimate of the option value. Hence, a simulation algorithm based on partitioning the payoff space cannot converge to the correct value. Although this example may seem contrived, Broadie and Detemple (1997) show that the payoff value is not a sufficient statistic for determining the optimal exercise decision for options on the maximum of several assets. Indeed, the payoff process h(St ) is hardly ever Markovian. There is currently no way to bound the error in the Barraquand and Martineau method. Without an error estimate, it is difficult to determine the appropriate number of paths to simulate or the appropriate number of partitions to use. Their method can be slightly modified to generate an option price estimate which is biased low as follows. Their procedure gives an exercise strategy based on the immediate exercise payoff. Using this strategy, a new (independent) set of paths can be simulated, and an option value can be estimated under the exercise strategy previously estimated. The resulting option price estimate will be biased low because the exercise policy is not, in general, the optimal policy. With this modification, the average direction of the error is known. Raymar and Zwecher (1997) extend the Barraquand and Martineau approach by basing the exercise decision on a partition of two state-variables, rather than one. 5.3 Broadie and Glasserman’s random tree algorithm Broadie and Glasserman (1997) propose an algorithm based on simulated trees. In order to handle the bias problem, they develop two estimators, one biased high and one biased low, but both convergent and asymptotically unbiased as the computational effort increases. A valid confidence interval for the true value P is obtained by taking the upper confidence limit from the “high” estimator and the lower confidence limit from the “low” estimator. Briefly, their algorithm works as follows.

230

P. Boyle, M. Broadie and P. Glasserman

First, simulate a tree of asset prices (or, more generally, state variables) using b branches at each node. Two paths emanating from a node evolve as independent copies of the state process. The high estimator, &, is defined to be the value obtained by the usual dynamic programming algorithm applied to the simulated tree. Then repeat the process for n trees, and compute a point estimate and confidence interval for E[&]. A low estimator is obtained by modifying the dynamic programming algorithm at each node. Instead of using all b branches to determine the decision and value, b1 branches are used to determine the exercise decision, and the remaining b2 = b − b1 branches are used to determine the continuation value. Their actual low estimator, θ, includes another modification of this procedure which reduces the variance of the estimate. As before, estimates from n trees are combined to give a point estimate and confidence interval for E[θ]. Details of the procedure can be found in Broadie and Glasserman (1997). For the & estimator, all of the branches at a given node are used to determine the optimal decision and the corresponding node value, and this leads to an upward bias, i.e., E[&] ≥ P. For the θ estimator, the decision and the continuation value are determined from independent information sets. This eliminates the upward bias, but a downward bias occurs, i.e., E[θ ] ≤ P. The intuition for this result follows. If the correct decision is inferred at a node, the node value estimate would be unbiased. If the incorrect decision is inferred at a node, the node value estimate would be biased low because of the suboptimality of the decision. The expected node value is a weighted average of an unbiased estimate (based on the correct decision) and an estimate which is biased low (based on the incorrect decision). The net effect is an estimate which is biased low. Both estimators are consistent and asymptotically unbiased as b increases. The computational effort with this algorithm is order nbd and its main drawback is that d cannot be too large for practical computations. Broadie and Glasserman (1997) give numerical results for options with d = 4. As mentioned earlier, to approximate option values with continuous exercise opportunities, some type of extrapolation procedure is required. Special care is necessary to implement extrapolation procedures within a simulation context because of the randomness in the estimates. 5.4 Other developments31 Grant, Vora, and Weeks (1997) describe a method specially designed to price American arithmetic Asian options on a single underlying asset. In this application the optimal exercise decision depends on the current asset price and the current 31 More recent developments in pricing American options by simulation include Broadie and Glasserman (1997),

Broadie, Glasserman and Ha (2000) and Longstaff and Schwartz (2001).

6. Monte Carlo Methods for Security Pricing

231

value of the average. Using repeated simulation runs, they attempt to identify the form of an optimal exercise policy based on these two pieces of information. Once an exercise policy is specified, simulation is used to estimate the option value under this fixed policy. Since the fixed policy is a suboptimal approximation to the optimal stopping rule, their procedure leads to a simulation estimator which is biased low. GVW perform extensive sensitivity analysis which indicates that their option value estimate is relatively insensitive to deviations in the chosen exercise policy. So it may be that their method gives good option price estimates relative to some accuracy level, but it is not clear how to quantify their error. It is not clear how to improve their estimates to an arbitrary accuracy level as the simulation effort increases. Their procedure is specific to the case of American Asian options and does not at this point constitute a general approach to pricing American contingent claims. Bossaerts (1989) proposes two estimators of optimal early exercise, a moment estimator and a smooth optimization estimator, and studies their convergence properties. His method appears to require a parametric representation of the exercise boundary and may therefore face difficulties in higher dimension. The optimization approach described in Fu and Hu (1995) also requires a parametric representation. Rust (1997)32 studies the general problem of solving discrete decision problems, which include optimal stopping problems as a special case. He develops a Monte Carlo method and shows that it succeeds in breaking the “curse of dimensionality” in these problem. Rust’s focus is on computational complexity, but his approach appears to provide a promising direction for finance applications. 5.5 Summary The valuation of securities with American-type features requires the determination of optimal decisions. High dimension versions of these problems arise from multiple state variables and/or path dependencies. Although simulation is a powerful tool for solving some higher dimensional problems, conventional wisdom was that simulation could not be applied to American-style pricing problems. The algorithms described here represent the first attempts to solve these problems that were long thought to be computationally intractable. 6 Further topics We conclude this paper with a brief mention of two important areas of current work in the application of Monte Carlo methods to finance, not discussed in this article. 32 We thank A. Dixit for pointing us to this reference.

232

P. Boyle, M. Broadie and P. Glasserman

A central numerical issue in simulating interest rates, asset prices with stochastic volatilities, and other complex diffusions is the accurate approximation of stochastic differential equations by discrete-time processes. Kloeden and Platen (1992) discuss a variety of methods for constructing discrete-time approximations with different orders of convergence. Andersen (1995) applies some of these to interest-rate models. In general, decreasing the time increment in a discrete approximation can be expected to give more accurate results, but at the expense of greater computational effort. Duffie and Glynn (1995) analyze this trade-off and characterize asymptotically optimal time steps as the overall computational effort grows. In this article we have focused almost exclusively on the use of Monte Carlo for pricing. A related, growing area of application is risk management – in particular, the use of Monte Carlo to assess value at risk, credit risk, and related measures. For some examples of recent applications in these areas see Iben and Brotherton-Ratcliffe (1994), Lawrence (1994), Beckstr¨om and Campbell (1995) and Glasserman, Heidelberger and Shahabuddin (2000).

Appendix: Moment controls beat moment matching asymptotically As mentioned in Section 2.4, any time a moment is available for use with moment matching, it can alternatively be used as a control variate. In this appendix, we argue that moment matching is asymptotically equivalent to a control variate technique with suboptimal coefficients, and is therefore dominated by the optimal use of moments as controls. This asymptotic link applies in large samples. A related link between linear and nonlinear control variates is made in Glynn and Whitt (1989), but the current setting does not fit their framework. Let Z 1 , Z 2 , . . . be i.i.d. (not necessarily normal) with mean µ and variance σ 2 . Let s denote the sample standard deviation of Z 1 , . . . , Z n and Z¯ their sample mean. Suppose we want to estimate E[ f (Z )] for some function f . The standard estimator n n is n −1 i=1 f (Z i ) and the moment matching estimator is n −1 i=1 f ( Z˜ i ) with Z˜ i defined in (9). For each i, the scaled difference √ σ −s √ √ n( Z˜ i − Z i ) = n Z i − n[(σ Z¯ /s) − µ] s converges in distribution, by the central limit theorem for Z¯ and s. Thus, ( Z˜ i − Z i ) = O p (n −1/2 ) (see, e.g., Appendix A of Pollard 1984 for O p , o p notation). Suppose now that, with probability one, f is differentiable at Z i . Then f ( Z˜ i ) = f (Z i ) + f (Z i )[ Z˜ i − Z i ] + o p (n −1/2 ), suggesting that up to terms o p (n −1/2 ) the moment matching estimator and standard

6. Monte Carlo Methods for Security Pricing

233

estimator are related via n n n 1 1 1 f ( Z˜ i ) ≈ f (Z i ) + f (Z i )[ Z˜ i − Z i ] n i=1 n i=1 n i=1 ! n n 1 1 σ σ = − 1 Z i − Z¯ + µ f (Z i ) + f (Z i ) n i=1 n i=1 s s n n 1 1 σ f (Z i ) + f (Z i )Z i = −1 n i=1 n i=1 s σ ¯ 1 n f (Z i ) µ − Z + n i=1 s n 1 σ σ f (Z i ) + βˆ 1 ≡ − 1 + βˆ 2 µ − Z¯ n i=1 s s where βˆ i → β i , i = 1, 2, as n → ∞, with β 1 = E[ f (Z )Z ],

and β 2 = E[ f (Z )].

Thus, moment matching is asymptotically equivalent to using σ σ ¯ −1 and µ− Z s s

(20)

as controls (both quantities converge to zero almost surely) with estimates of coefficients β 1 , β 2 . In general, these do not coincide with the optimal coefficients β ∗1 , β ∗2 , so moment matching is asymptotically dominated by the control variate method. In addition, the controls in (20) introduce some bias (as does moment matching itself) because though they converge to zero they do not have mean zero for finite n. In contrast, the more natural moment control variates (s 2 − σ 2 ) and ( Z¯ − µ) have mean zero for all n and thus introduce no bias. References Acworth, P., M. Broadie, and P. Glasserman, 1997, A Comparison of Some Monte Carlo and Quasi Monte Carlo Methods for Option Pricing, in Monte Carlo and Quasi Monte Methods for Scientific Computing, G. Larcher, P. Hellekalek, H. Niederreiter, and P. Zinterhof (eds.), Springer-Verlag, Berlin. Andersen, L., 1995, Efficient Techniques for Simulation of Interest Rate Models Involving Non-Linear Stochastic Differential Equations, Working paper (General Re Financial Products, New York, NY). Andersen, L., and R. Brotherton-Ratcliffe, 1996, Exact Exotics, Risk 9, October, 85–89. Barlow, R.E. and F. Proschan, 1975, Statistical Theory of Reliability and Life Testing (Holt, Reinhart and Winston, New York). Barraquand, J., 1995, Numerical Valuation of High Dimensional Multivariate European Securities, Management Science 41, 1882–1891.

234

P. Boyle, M. Broadie and P. Glasserman

Barraquand, J. and D. Martineau, 1995, Numerical Valuation of High Dimensional Multivariate American Securities, Journal of Financial and Quantitative Analysis 30, 383–405. Beaglehole, D., P. Dybvig, and G. Zhou, 1997, Going to Extremes: Correcting Simulation Bias in Exotic Option Valuation, Financial Analysts Journal (Jan/Feb) 62–68. Beckstr¨om, R. and A. Campbell, 1995, An Introduction to VAR (CATS Software, Palo Alto, California). Berman, L., 1996, Comparison of Path Generation Methods for Monte Carlo Valuation of Single Underlying Derivative Securities, Research Report RC-20570, IBM Research, Yorktown Heights, New York. Birge, J.R., 1994, Quasi-Monte Carlo Approaches to Option Pricing, Technical Report 94–119 (Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI 48109). Bossaerts, P., 1989, Simulation Estimators of Optimal Early Exercise, Working paper (Carnegie-Mellon University, Pittsburgh, PA, 15213). Boyle, P., 1977, Options: A Monte Carlo Approach, Journal of Financial Economics 4, 323–338. Boyle, P. and D. Emanuel, 1985, The Pricing of Options on the Generalized Mean, Working paper (University of Waterloo). Bratley, P. and B. Fox, 1988, ALGORITHM 659: Implementing Sobol’s Quasirandom Sequence Generator, ACM Transactions on Mathematical Software 14, 88–100. Bratley, P., B.L. Fox, and H. Niederreiter, 1992, Implementation and Tests of Low-Discrepancy Sequences, ACM Transactions on Modelling and Computer Simulation 2, 195–213. Bratley, P., B.L. Fox, and L. Schrage, 1987, A Guide to Simulation, 2nd Ed. (Springer-Verlag, New York). Broadie, M. and J. Detemple, 1997, The Valuation of American Options on Multiple Assets, Mathematical Finance 7, 241–286. Broadie, M. and J. Detemple, 1996, American Option Valuation: New Bounds, Approximations, and a Comparison of Existing Methods, Review of Financial Studies 9, 1211–1250. Broadie, M. and P. Glasserman, 1996, Estimating Security Price Derivatives by Simulation, Management Science 42, 269–285. Broadie, M. and P. Glasserman, 1997, Pricing American-Style Securities Using Simulation, Journal of Economic Dynamics and Control 21, 1323–1352. Broadie, M. and P. Glasserman, 1997, A Stochastic Mesh Method for Pricing High-Dimensional American Options, Working paper, Columbia Business School, New York. Broadie, M., P. Glasserman, and Z. Ha, 2000, Pricing American Options by Simulation Using a Stochastic Mesh with Optimized Weights, in Probabilistic Constrained Optimization, S. Uryasev, ed., 26–44 (Kluwer, Norwell, Mass.) Caflisch, R.E., W., Morokoff, and A. Owen, 1998, Valuation of Mortgage Backed Securities Using Brownian Bridges to Reduce Effective Dimension, in Monte Carlo: Methodologies and Applications for Pricing and Risk Management, 301–314 (Risk Publications, London). Carr, P., 1993, Deriving Derivatives of Derivative Securities, Working paper (Johnson Graduate School of Business, Cornell University). Carriere, J.F., 1996, Valuation of the Early-Exercise Price for Derivative Securities using Simulations and Splines, Insurance: Mathematics and Economics 19, 19–30. Carverhill, A. and K. Pang, 1995, Efficient and Flexible Bond Option Valuation in the

6. Monte Carlo Methods for Security Pricing

235

Heath, Jarrow and Morton Framework, Journal of Fixed Income 5, September, 70–77. Cheyette, O., 1992, Term Structure Dynamics and Mortgage Valuation, Journal of Fixed Income 2, March, 28–41. Clewlow, L. and A. Carverhill, 1994, On the Simulation of Contingent Claims, Journal of Derivatives 2, Winter, 66–74. Devroye, L., 1986, Non-Uniform Random Variate Generation (Springer-Verlag, New York). Dixit, A. and R. Pindyck, 1994, Investment Under Uncertainty (Princeton University Press). Duan, J.-C., 1995, The GARCH Option Pricing Model, Mathematical Finance 5, 13–32. Duan, J.-C. and J.-G. Simonato, 1998, Empirical Martingale Simulation for Asset Prices, Management Science 44, 1218–1233. Duffie, D., 1996, Dynamic Asset Pricing Theory, 2nd ed. (Princeton University Press, Princeton, New Jersey). Duffie, D. and P. Glynn, 1995, Efficient Monte Carlo Simulation of Security Prices, Annals of Applied Probability 5, 897–905. Faure H., 1982, Discr´epance de Suites Associ´ees a` un Syst`eme de Num´eration (en Dimension s), Acta Arithmetica 41, 337–351. Fox, B.L., 1986, ALGORITHM 647: Implementation and Relative Efficiency of Quasi-Random Sequence Generators, ACM Transactions on Mathematical Software 12, 362–376. Fu, M. and J.Q. Hu, 1995, Sensitivity Analysis for Monte Carlo Simulation of Option Pricing, Probability in the Engineering and Information Sciences 9, 417–446. Fu, M., D. Madan, and T. Wong, 1998, Pricing Continuous Time Asian Options: A Comparison of Analytical and Monte Carlo Methods, Journal of Computational Finance 2, 49–74. Geske, R. and H.E. Johnson, 1984, The American Put Options Valued Analytically, Journal of Finance 39, 1511–1524. Glasserman, P., 1991, Gradient Estimation via Perturbation Analysis (Kluwer Academic Publishers, Norwell, Mass). Glasserman, P., 1993, Filtered Monte Carlo, Mathematics of Operations Research 18, 610–634. Glasserman, P., P. Heideberger, and P. Shahabuddin, 2000, Variance Reduction Techniques for Estimating Value-at-Risk, Management Science 46, 1349–1365. Glasserman, P. and D.D. Yao, 1992, Some Guidelines and Guarantees for Common Random Numbers, Management Science 38, 884–908. Glynn, P.W., 1987, Likelihood Ratio Gradient Estimation: An Overview, in: Proceedings of the Winter Simulation Conference (The Society for Computer Simulation, San Diego, California) 366–374. Glynn, P.W., 1989, Optimization of Stochastic Systems via Simulation, in: Proceedings of the Winter Simulation Conference (The Society for Computer Simulation, San Diego, California) 90–105. Glynn, P.W. and D.L. Iglehart, 1988, Simulation Methods for Queues: An Overview, Queueing Systems 3, 221–255. Glynn, P.W. and W. Whitt, 1989, Indirect Estimation via L = λW , Operations Research 37, 82–103. Glynn, P.W. and W. Whitt, 1992, The Asymptotic Efficiency of Simulation Estimators, Operations Research 40, 505–520. Grant, D., G. Vora, and D. Weeks, 1997, Path-Dependent Options: Extending the Monte

236

P. Boyle, M. Broadie and P. Glasserman

Carlo Simulation Approach, Management Science 43, 1589–1602. Halton, J.H., 1960, On the Efficiency of Certain Quasi-Random Sequences of Points in Evaluating Multi-Dimensional Integrals, Numerische Mathematik 2, 84–90. Hammersley, J.M. and D.C. Handscomb, 1964, Monte Carlo Methods (Chapman and Hall, London). Haselgrove, C.B., 1961, A Method for Numerical Integration, Mathematics of Computation 15, 323–337. Hlawka, E., 1971, Discrepancy and Riemann Integration, in: L. Mirsky, ed., Studies in Pure Mathematics (Academic Press, New York). Hull, J., 2000, Options, Futures, and Other Derivative Securities, 4th ed. (Prentice-Hall, Englewood Cliffs, New Jersey). Hull, J. and A. White, 1987, The Pricing of Options on Assets with Stochastic Volatilities, Journal of Finance 42, 281–300. Iben, B. and R. Brotherton-Ratcliffe, 1994, Credit Loss Distributions and Required Capital for Derivatives Portfolios, Journal of Fixed Income 4, June, 6–14. Johnson, H., 1987, Options on the Maximum or the Minimum of Several Assets, Journal of Financial and Quantitative Analysis 22, 227–283. Johnson, H. and D. Shanno, 1987, Option Pricing When the Variance is Changing, Journal of Financial and Quantitative Analysis 22, 143–151. Joy C., P.P. Boyle, and K.S. Tan, 1996, Quasi-Monte Carlo Methods in Numerical Finance, Management Science 42, 926–938. Kemna, A.G.Z. and A.C.F. Vorst, 1990, A Pricing Method for Options Based on Average Asset Values, Journal of Banking and Finance 14, 113–129. Kloeden, P. and E. Platen, 1992, Numerical Solution of Stochastic Differential Equations (Springer-Verlag, New York). L’Ecuyer, P. and G. Perron, 1994, On the Convergence Rates of IPA and FDC Derivative Estimators, Operations Research 42, 643–656. Lavenberg, S.S. and P.D. Welch, 1981, A Perspective on the Use of Control Variables to Increase the Efficiency of Monte Carlo Simulations, Management Science 27, 322–335. Lawrence, D., 1994, Aggregating Credit Exposures: The Simulation Approach, in: Derivative Credit Risk (Risk Publications, London). Longstaff, F.A. and E.S. Schwartz, 2001, Valuing American Options by Simulation: A Simple Least Squares Approach, Review of Financial Studies 14, 113–148. Marchuk, G. and V. Shaidurov, 1983, Difference Methods and Their Extrapolations (Springer Verlag, New York). McKay, M.D., W.J. Conover, and R.J. Beckman, 1979, A Comparison of Three Methods for Selecting Input Variables in the Analysis of Output from a Computer Code, Technometrics 21, 239–245. Morokoff, W.J. and R.E. Caflisch, 1995, Quasi-Monte Carlo Integration, Journal of Computational Physics, 122, 218–230. Moskowitz B. and R.E. Caflisch, 1996, Smoothness and Dimension Reduction in Quasi-Monte Carlo Methods, Mathematical and Computer Modeling 23, 37–54. Niederreiter, H., 1988, Low Discrepancy and Low Dispersion Sequences, Journal of Number Theory 30, 51–70. Niederreiter, H., 1976, On the Distribution of Pseudo-Random Numbers Generated by the Linear Congruential Method. III, Mathematics of Computation 30, 571–597. Niederreiter, H., 1992, Random Number Generation and Quasi-Monte Carlo Methods (CBMS-NSF 63, SIAM, Philadelphia, Pa). Niederreiter, H. and C. Xing, 1996, Low-Discrepancy Sequences and Global Function

6. Monte Carlo Methods for Security Pricing

237

Fields with Many Rational Places, Finite Fields and their Applications 2, 241–273. Nielsen, S., 1994, Importance Sampling in Lattice Pricing Models, Working paper (Management Science and Information Systems, University of Texas at Austin). Ninomiya, S., and S. Tezuka, 1996, Toward Real-Time Pricing of Complex Financial Derivatives, Applied Mathematical Finance 3, 1–20. Owen, A., 1995a, Monte Carlo Variance of Scrambled Equidistribution Quadrature, in: H. Niederreiter and P.J.S. Shiue, eds., Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (Springer-Verlag, Berlin). Owen, A., 1995b, Randomly Permuted (t, m, s)-Nets and (t, s)-Sequences, in Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing, H. Niederreiter and P. Shiue (eds.), 299–317 (Springer-Verlag, New York). Paskov, S. and J. Traub, 1995, Faster Valuation of Financial Derivatives, Journal of Portfolio Management 22, Fall, 113–120. Pollard, D., 1984, Convergence of Stochastic Processes, Springer-Verlag, New York. Press, W.H., S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, 1992, Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. (Cambridge University Press). Raymar, S., and M. Zwecher, 1997, A Monte Carlo Valuation of American Call Options On the Maximum of Several Stocks, Journal of Derivatives 5 (Fall), 7–24. Reider, R., 1993, An Efficient Monte Carlo Technique for Pricing Options, Working paper (Wharton School, University of Pennsylvania). Rubinstein, R. and A. Shapiro, 1993, Discrete Event Systems (Wiley, New York). Rust, J., 1997, Using Randomization to Break the Curse of Dimensionality, Econometrica 65, 487–516. Schwartz, E.S. and W.N. Torous, 1989, Prepayment and the Valuation of Mortgage-Backed Securities, Journal of Finance 44, 375–392. Scott, L.O., 1987, Option Pricing when the Variance Changes Randomly: Theory, Estimation, and an Application, Journal of Financial and Quantitative Analysis 22, 419–438. Shaw, J., 1995, Beyond VAR and Stress Testing, in Monte Carlo: Methodologies and Applications for Pricing and Risk Management, 231–244 (Risk Publications, London). Sobol’, I.M., 1967, On the Distribution of Points in a Cube and the Approximate Evaluation of Integrals, USSR Computational Mathematics and Mathematical Physics 7, 86–112. Spanier, J. and E.H. Maize, 1994, Quasi-Random Methods for Estimating Integrals Using Relatively Small Samples, SIAM Review 36, 18–44. Stein, M., 1987, Large Sample Properties of Simulations Using Latin Hypercube Sampling, Technometrics 29, 143–151. Stulz, R.M., 1982, Options on the Minimum or the Maximum of Two Risky Assets, Journal of Financial Economics 10, 161–185. Tezuka, S., 1994, A Generalization of Faure Sequences and its Efficient Implementation, Research Report RTO105 (IBM Research, Tokyo Research Laboratory, Kanagawa, Japan). Tezuka, S., 1995, Uniform Random Numbers: Theory and Practice (Kluwer Academic Publishers, Boston). Tilley, J.A., 1993, Valuing American Options in a Path Simulation Model, Transactions of the Society of Actuaries 45, 83–104. Turnbull, S.M. and L.M. Wakeman, 1991, A Quick Algorithm for Pricing European Average Options, Journal of Financial and Quantitative Analysis 26, 377–389. Van Rensberg J. and G.M. Torrie, 1993, Estimation of Multidimensional Integrals: Is

238

P. Boyle, M. Broadie and P. Glasserman

Monte Carlo the Best Method?, Journal of Physics A: Mathematical and General 26, 943–953. Wiggins, J.B., 1987, Option Values under Stochastic Volatility: Theory and Empirical Evidence, Journal of Financial Economics 19, 351–372. Willard, G.A., 1997, Calculating Prices and Sensitivities for Path-Dependent Derivative Securities in Multifactor Models, Journal of Derivatives 5 (Fall), 45–61. Worzel, K.J., C. Vassiadou-Zeniou, and S.A. Zenios, 1994, Integrated Simulation and Optimization Models for Tracking Indices of Fixed-Income Securities, Operations Research 42, 223–233. Zaremba, S.K., 1968, The Mathematical Basis of Monte Carlo and Quasi-Monte Carlo Methods, SIAM Review 10, 310–314.

Part two Interest Rate Modeling

7 A Geometric View of Interest Rate Theory Tomas Bj¨ork

1 Introduction 1.1 Setup We consider a bond market model (see Bj¨ork (1997), Musiela and Rutkowski (1997)) living on a filtered probability space (, F, F, Q) where F = {Ft }t≥0 . The basis is assumed to carry a standard m-dimensional Wiener process W , and we also assume that the filtration F is the internal one generated by W . By p(t, x) we denote the price, at t, of a zero coupon bond maturing at t + x, and the forward rates r (t, x) are defined by r (t, x) = −

∂ log p(t, x) . ∂x

Note that we use the Musiela parameterization, where x denotes the time to maturity. The short rate R is defined as R(t) = r (t, 0), and the money account

t B is given by B(t) = exp 0 R(s)ds . The model is assumed to be free of arbitrage in the sense that the measure Q above is a martingale measure for the model. In other words, for every fixed time of maturity T ≥ 0, the process Z (t, T ) = p(t, T − t)/B(t) is a Q-martingale. Let us now consider a given forward rate model of the form " dr (t, x) = β(t, x)dt + σ (t, x)dW, (1) r (0, x) = r o (0, x), where, for each x, β and σ are given optional processes. The initial curve {r o (0, x); x ≥ 0} is taken as given. It is interpreted as the observed forward rate curve. The standard Heath–Jarrow–Morton drift condition (Heath, Jarrow and Morton (1992)) can easily be transferred to the Musiela parameterization. The result (see Brace and Musiela (1994), Musiela (1993)) is as follows. 241

242

T. Bj¨ork

Proposition 1.1 (The forward rate equation) Under the martingale measure Q the r -dynamics are given by

x ∂ σ (t, u)- du dt + σ (t, x)dW (t), (2) dr (t, x) = r (t, x) + σ (t, x) ∂x 0 o (3) r (0, x) = r (0, x). where - denotes transpose. 1.2 Main problems Suppose now that we are give a concrete model M within the above framework, i.e. suppose that we are given a concrete specification of the volatility process σ . We now formulate a couple of natural problems: 1. Take, in addition to M, also as given a parameterized family G of forward rate curves. Under which conditions is the family G consistent with the dynamics of M? Here consistency is interpreted in the sense that, given an initial forward rate curve in G, the interest rate model M will only produce forward rate curves belonging to the given family G. 2. When can the given, inherently infinite dimensional, interest rate model M be written as a finite dimensional state space model? More precisely, we seek conditions under which the forward rate process r (t, x), induced by the model M, can be realized by a system of the form d Zt

= a(Z t )dt + b(Z t )dWt ,

r (t, x) = G(Z t , x),

(4) (5)

where Z (interpreted as the state vector process) is a finite dimensional diffusion, a(z), b(z) and G(z, x) are deterministic functions and W is the same Wiener process as in in (2). As will be seen below, these two problems are intimately connected, and the main purpose of this chapter is to give an overview of some recent work in this area. The text is mainly based on Bj¨ork and Christensen (1999), Bj¨ork and Gombani (1999) and Bj¨ork and Svensson (1999), but the presentation given below is more focused on geometric intuition than the original articles, where full proofs, technical details and further results can be found. In the analysis below we use ideas from systems and control theory (see Isidori (1989)) as well as from nonlinear filtering theory (see Brockett (1981)). References to the literature will sometimes be given in the text, but will mainly be summarized in the Notes at the end of each section. The organization of the text is as follows. In Section 2 we study the existence of a finite dimensional factor realization in the comparatively simple case when

7. A Geometric View of Interest Rate Theory

243

the forward rate volatilities are deterministic. In Section 3 we study the general consistency problem, and in Section 4 we use the consistency results from Section 3 in order to give a fairly complete picture of the nonlinear realization problem.

2 Linear realization theory In the general case, the forward rate equation (2) is a highly nonlinear infinite dimensional SDE but, as can be expected, the special case of linear dynamics is much easier to handle. In this section we therefore concentrate on linear forward rate models, and look for finite dimensional linear realizations.

2.1 Deterministic forward rate volatilities For the rest of the section we only consider the case when the volatility σ (t, x) = [σ 1 (t, x), . . . , σ m (t, x)] is a deterministic time-independent function σ (x) of x only. Assumption 2.1 The volatility σ is a deterministic C ∞ -mapping σ : R+ → R m . Denoting the function x −→ r (t, x) by r (t) we have, from (2), dr (t) = {Fr (t) + D} dt + σ dW (t), r (0) = r (0). o

(6) (7)

Here the linear operator F is defined by F= whereas the function D is given by

∂ , ∂x

D(x) = σ (x)

x

(8)

σ (s)- ds.

(9)

0

The point to note here is that, because of our choice of a deterministic volatility σ (x), the forward rate equation (6) is a linear (or rather affine) SDE. Because of this linearity (albeit in infinite dimensions) we therefore expect to be able to provide an explicit solution of (6). We now recall that a scalar equation of the form dy(t) = [ay(t) + b] dt + cdW (t) has the solution

y(t) = e y(0) + at

t

e 0

a(t−s)

bds +

t

ea(t−s) cdW (s), 0

244

T. Bj¨ork

and we are led to conjecture that the solution to (6) is given by the formal expression t t r (t) = eFt r o + eF(t−s) Dds + eF(t−s) σ dW (s). 0

0

The formal exponential e Ft acts on real valued functions, and we have to figure out how it operates. From the standard series expansion of the exponential function one is led to write ∞ Ft tn n F f (x). e f (x) = (10) n! n=0 In our case F n =

∂n , ∂xn

so (assuming f to be analytic) we have ∞ tn ∂n f eFt f (x) = (x). n! ∂ x n n=0

(11)

This is, however, just series expansion of f around the point x, so for a Taylor analytic f we have eFt f (x) = f (x + t). We have in fact the following precise result (which can be proved rigorously). Proposition 2.2 The operator F is the infinitesimal generator of the semigroup of left translations, i.e. for any f ∈ C[0, ∞) we have Ft e f (x) = f (t + x). The solution of the forward rate equation (6) is given as t t eF(t−s) D(x)ds + eF(t−s) σ (x)dW (s) r (t, x) = eFt r o (0, x) + 0

(12)

0

or equivalently by

t

r (t, x) = r (0, x + t) + o

t

D(x + t − s)ds +

0

σ (x + t − s)dW (s).

(13)

0

From (12) it is clear by inspection that we may write the forward rate equation (6) as dr0 (t, x) = Fr0 (t, x)dt + σ (x)dW (t), r0 (0, x) = 0 r (t, x) = r0 (t, x) + δ(t, x), where δ is given by

δ(t, x) = r (0, x + t) + o

0

t

(14) (15)

D(x + t − s)ds.

(16)

7. A Geometric View of Interest Rate Theory

245

Since δ(t, x) is not affected by the input W , we see that the problem of finding a realization for the term structure system (6) is equivalent to that of finding a realization for (14). We are thus led to the following definition. Definition 2.3 A matrix triple [A, B, C(x)] is called an n-dimensional realization of the systems (6) and (14) if r0 has the representation d Z (t) =

AZ (t)dt + BdW (t), Z (0) = 0,

r0 (t, x) = C(x)Z (t).

(17) (18)

Our main problems are now as follows. • • • • •

Take as a priori given a volatility structure σ (x). When does there exists a finite dimensional realization? If there exists a finite dimensional realization, what is the minimal dimension? How do we construct a minimal realization from knowledge of σ ? Is there an economic interpretation of the state process Z in the realization?

2.2 Existence of finite linear realizations We will now go on to study the existence of a finite dimensional realization of the stochastic system (14), and in order to get some ideas, suppose that there actually exists a finite dimensional realization of (14) of the form (17)–(18). Solving (14), we have t t F(t−s) e σ (x)dW (s) = σ (x + t − s)dW (s), r0 (t, x) = 0

0

while, from the realization (17)–(18), we also have t r0 (t, x) = C(x)Z (t) = C(x) e A(t−s) BdW (s). 0

Thus we have, with probability one, for each x and each t, t t σ x (t − s)dW (s) = C(x)e A(t−s) BdW (s), 0

(19)

0

where we use subindex x to denote left translation, i.e. f x (t) = f (x + t). This leads us immediately to conjecture that the equation σ x (t) = C(x)e At B must hold for all x and t, and we have our first main result.

246

T. Bj¨ork

Proposition 2.4 1. The forward rate process has a finite dimensional linear realization if and only if the volatility function σ can be written in the form σ (x) = C0 e Ax B.

(20)

2. If σ has the form (20) then a concrete realization of r0 is given by d Z (t) =

AZ (t)dt + BdW (t), Z (0) = 0,

r0 (t, x) = C(x)Z (t),

(21) (22)

with A, B as in (20), and with C(x) = C0 e Ax . The forward rates r (t, x) are then given by (15)–(16). Proof It is clear from the discussion above that if there exists a finite realization, then we must have the factorization σ x (t) = C(x)e At B. Setting x = 0, and denoting C(0) by C0 , in this case gives us the relation (20). If, on the other hand, σ factors as in (20), then we simply define Z as in (21). A direct calculation as above then shows that we have r0 (t, x) = C0 e Ax z(t). Remark 2.5 Let us call a function of the form ce Ax b, where c is a row vector, A is a square matrix and b is a column vector, a quasi-exponential (or QE) function. The general form of a quasi-exponential function f is given by f (x) = eλi x + eαi x p j (x) cos(ω j x) + q j (x) sin(ω j x) , (23) i

j

where λi , α 1 , ω j are real numbers, whereas p j and q j are real polynomials. QE functions will turn up again, so we list some simple properties. Lemma 2.6 The following hold for the quasi-exponential functions: • A function is QE if and only if it is a component of the solution of a vector valued linear ODE with constant coefficients. • A function is QE if and only if it can be written as f (x) = ce Ax b. • If f is QE, then f is QE. • If f is Q E, then its primitive function is QE. • If f and g are QE, then f g is QE.

7. A Geometric View of Interest Rate Theory

247

2.3 Transfer functions Using ideas from linear systems theory, an alternative view of the realization problem is obtained by studying transfer functions, i.e. by going to the frequency domain. To get some intuition, consider again the equation dr0 (t, x) = Fr0 (t, x)dt + σ (x)dW (t), r0 (0, x) = 0.

(24)

Let us now formally “divide by dt”, which gives us dr0 dW (t, x) = Fr0 (t, x) + σ (x) (t), dt dt where the formal time derivative ddtW (t) is interpreted as white noise. We interpret this equation as an input–output system where the random input signal t −→ ddtW (t) is transformed into the infinite dimensional output signal t −→ r0 (t, ·). We thus view the equation as a version of the following controlled ODE: dr0 (t, x) = Fr0 (t, x) + σ (x)u(t), dt r0 (0) = 0,

(25)

where u is a deterministic input signal. Generally speaking, tricks like this do not work directly, since we are ignoring the difference between standard differential calculus, which is used to analyze (25), and Itˆo calculus which we use when dealing with SDEs. In this case, however, because of the linear structure, the second order Itˆo term will not come into play, so we are safe. (See the discussion in Section 3.4 around the Stratonovich integral for how to treat the nonlinear situation.) It is now natural to study the transfer function for the system (25), which relates the Laplace transform of the input signal to the Laplace transform of the output signal. Definition 2.7 The transfer function, K (s, x), for (25) is determined by the relation r˜0 (s, x) = K (s, x)u(s), ˜ where ˜ denotes the Laplace transform in the t-variable. From the uniqueness of the Laplace transform we then have the following result. Lemma 2.8 The system d Z (t) =

AZ (t)dt + BdW (t), Z (0) = 0,

r0 (t, x) = C(x)Z (t)

(26) (27)

248

T. Bj¨ork

is a realization of dr0 (t, x) = Fr0 (t, x)dt + σ (x)dW (t), r0 (0, x) = 0

(28)

if and only if the deterministic control system dr0 (t, x) = Fr0 (t, x) + σ (x)u(t) dt has the same transfer function as the system dZ (t) = AZ (t) + Bu(t), dt r0 (t, x) = C(x)Z (t).

(29)

(30) (31)

Furthermore we have Lemma 2.9 The transfer function K (s, x) of (29) is given by K (s, x) = L [σ x ] (s), where L denotes the Laplace transform, and σ x denotes left translation. Proof From (29) we have

t

r0 (t, x) =

σ (x + t − s)u(s)ds = [σ x - u] (t),

0

and thus r˜0 (s, x) = L [σ x ] (s)u(s). ˜ For concrete computation of a realization, the following result is useful. Lemma 2.10 • The transfer function of the system (30)–(31) is given by K (s, x) = C(x) [s I − A]−1 B. • The r0 system has a finite realization if and only if there exists a factorization of the form L [σ x ] (s) = C(x) [s I − A]−1 B. • Denote the transfer function of r0 by K (s, x), and assume that that there exits a finite dimensional realization. If we have found A, B and C such that K (s, 0) = C [s I − A]−1 B, then a realization of r0 is given by A, B, Ce Ax .

7. A Geometric View of Interest Rate Theory

249

Proof The first assertion is immediately obtained by taking the Laplace transform of (30)–(31). The second follows from Lemma 2.8, and the third from Proposition 2.4. If we want to find a concrete realization for a given system, we thus have two possibilities. We can either look for a factorization of the volatility function as σ (x) = Ce Ax B, or we can try to factor the transfer function as K (s, 0) = C [s I − A]−1 B. From a logical point of view the two approaches are equivalent, but from a practical point of view it is much easier to factor the transfer function than to factor the volatility. There are in fact a number of standard algorithms in the systems theoretic literature which construct a realization, given knowledge of the transfer functions. See Brockett (1970).

2.4 Minimal realizations The purpose of this section is to determine the minimal dimension of a finite dimensional realization. Definition 2.11 The dimension of a realization [A, B, C(x)] is defined as the dimension of the corresponding state space. A realization [A, B, C(x)] is said to be minimal if there is no other realization with smaller dimension. The McMillan degree, D, of the forward rate system is defined as the dimension of a minimal realization. In order to get a feeling for how to determine the McMillan degree, we note that r0 has a finite dimensional realization if and only if r0 evolves on a finite dimensional subspace in the infinite dimensional function space H. Furthermore, it seems obvious that the McMillan degree equals the dimension of this subspace. In order to determine the subspace above, let us again view the r 0 system as a special case of the following controlled equation, where we have suppressed x. dr 0 = Fr0 (t) + σ u(t), dt (32) r0 (0) = 0. The solution of this equation is given by t t ∞ (t − s)n n eF(t−s) σ u(s)ds = r0 (t) = F σ u(s)ds. n! 0 0 0 This is a linear combination of vectors of the form Fn σ i , so we see that the smallest subspace R which contains r0 (t) for all t and for all choices of the input signal u

250

T. Bj¨ork

is given by R = span σ , Fσ , F2 σ , . . . = span Fk σ i ; i = 1, . . . , m k = 0, 1, . . . . (33) We thus have the following result. Proposition 2.12 Take the volatility function σ = [σ 1 , . . . , σ m ] as given. Then the McMillan degree, D, is given by D = dim (R) ,

(34)

with R defined as in (33). The forward rate system thus admits a finite dimensional realization if and only if the space spanned by the components of σ and all their derivatives is finite dimensional.

2.5 Economic interpretation of the state space In general, the state space of the minimal realization of a given system has no concrete (e.g. physical) interpretation. In our case, however, the states of the minimal realization turn out to have a simple economic interpretation in terms of a minimal set of “benchmark” forward rates. Assume that [A, B, C] is a minimal realization, of dimension n, of the forward rates as in (21)–(22). Let us choose a set of “benchmark” maturities x1 , . . . , xn . We use the notation x¯ = (x1 , . . . , x n ). Assume furthermore that the maturity vector x¯ is chosen so that the matrix Ce Ax1 .. T (x) ¯ = . Ce Axn is invertible. It can be shown (see Bj¨ork and Gombani (1999)) that, outside a set of measure zero, this can always be done as long as the maturities are distinct. We use the notation r0 (t, x 1 ) .. ¯ = r0 (t, x) . r0 (t, xn )

and corresponding interpretations for column vectors like r (t, x), ¯ δ(t, x) ¯ etc. The following result shows how the entire term structure is determined by the benchmark forward rates.

7. A Geometric View of Interest Rate Theory

251

Proposition 2.13 Assume that (21)–(22) is a minimal realization of the forward rates, and assume furthermore that a maturity vector x¯ = (x1 , . . . , xn ) is chosen as above. Then the following hold. • With notation as above, the vector r(t, x) ¯ of benchmark forward rates has the dynamics −1 dr (t, x) ¯ = T (x)AT ¯ (x)r ¯ (t, x) ¯ + %(t, x) ¯ dt + T (x)Bd ¯ W (t), (35) ¯ r (0, x) ¯ = r - (0, x), where the deterministic function % is given by ∂r −1 (x)δ(t, ¯ x). ¯ (0, t e¯ + x) ¯ + D(t e¯ + x) ¯ − T (x)AT ¯ ∂x Here e¯ ∈ R n denotes the vector with unit components, i.e. 1 1 e¯ = . . .. 1 %(t, x) ¯ =

• The system of benchmark forward rates determine the entire forward rate process according to the formula ¯ (t, x) ¯ − Ce Ax T −1 (x)δ(t, ¯ x) ¯ + δ(t, x). r (t, x) = Ce Ax T −1 (x)r

(36)

• The correspondence between Z and r is given by r0 (t, x) ¯ = T (x)Z ¯ (t).

(37)

Proof See Bj¨ork and Gombani (1999). The conclusion is thus that the state variables of a minimal realization can be interpreted as an affine transformation of a vector of benchmark forward rates.

2.6 Examples In this section we will give some simple illustrations of the theory. Note the handling of multiple roots of the matrix A, and the fact that the input noise can have dimension smaller than the dimension of A. Example 2.14 σ (x) = σ e−ax We consider a model driven by a one-dimensional Wiener process, having the forward rate volatility structure σ (x) = σ e−ax ,

252

T. Bj¨ork

where σ in the right hand side denotes a constant. (The reader will probably recognize this example as the Hull–White model.) We start by determining the McMillan degree D, and by Proposition 2.12 we have D = dim(R), where the space R is given by R = span

! dk −ax σ e ; k ≥ 0 . dxk

It is obvious that R is one dimensional, and that it is spanned by the single function e−ax . Thus the McMillan degree is given by D = 1. We now want to apply Proposition 2.4 to find a realization, so we must factor the volatility function. In this case this is easy, since we have the trivial factorization σ (x) = 1 · e−ax · σ . In the notation of Proposition 2.4 we thus have C0 = 1, A = −a, B = σ. A realization of the forward rates is thus given by d Z (t) = −a Z (t)dt + σ dW (t), r0 (t, x) = e−ax Z (t), r (t, x) = r0 (t, x) + δ(t, x), and since the state space in this realization is of dimension one, the realization is minimal. We see that if a > 0 then the system is asymptotically stable. We now go on to the interpretation of the state space, and since D = 1 we can choose a single benchmark maturity. The canonical choice is of course x1 = 0, i.e. we choose the instantaneous short rate R(t) as the state variable. In the notation of Proposition 2.13 we then have T (x) ¯ = 1, r (t, x) ¯ = R(t), and we get rate dynamics d R(t) = {%(t, 0) − a R(t)} dt + σ dW (t). Thus we see that we have indeed the Hull–White extension of the Vasiˇcek model (1977). Note however that we do not have to choose the benchmark maturity as

7. A Geometric View of Interest Rate Theory

253

x1 = 0. We can in fact choose any fixed maturity, x1 , and then use the corresponding forward rate as benchmark. This will give us the dynamics dr (t, x 1 ) = {%(t, x 1 ) − ar (t, x 1 )} dt + e−ax1 dW (t), and now the entire forward rate curve will be determined by the x 1 -rate according to formula (36). Example 2.15 σ (x) = xe−ax In this example we still have a single driving Wiener process, but the volatility function is now “hump-shaped”. By taking derivatives of σ (x) we immediately see, from Proposition 2.12, that R is given by R = span xe−ax , e−ax , so in this case D = 2, and we have a two-dimensional minimal state space. In order to obtain a realization we compute the transfer function K (s, x), which is given by Lemma 2.9 as K (s, x) = L (x + ·)e−a(x+·) (s). An easy calculation gives us K (s, x) =

sxe−ax + (1 + ax)e−ax e −ax xe−ax = + , (a + s)2 (a + s) (a + s)2

and we now look for a realization of this transfer function (for a fixed x). The obvious thing to do is to use the standard controllable realization (see Brockett (1970)), and we obtain C(x) = xe−ax , (1 + ax)e−ax , ! −2a −a 2 , A = 1 0 ! 1 . B = 0 Since D = 2 and this realization is two-dimensional we have a minimal realization, given by d Z 1 (t) = −2a Z 1 (t)dt − a 2 Z 2 (t)dt + dW (t), d Z 2 (t) = Z 1 (t)dt, r0 (t, x) = xe−ax Z 1 (t) + (1 + ax)e−ax Z 2 (t), r (t, x) = r0 (t, x) + δ(t, x).

254

T. Bj¨ork

We have a double eigenvalue of the system matrix A at λ1 = −a, so if a > 0 the system is asymptotically stable.

2.7 Notes This section is mainly based on Bj¨ork and Gombani (1999). The first paper to appear in this area was to our knowledge the preprint (Musiela (1993)), where the Musiela parameterization and the space R are discussed in some detail. See also the closely related and interesting preprints El Karoui and Lacoste (1993), El Karoui, Geman and Lacoste (1997) and Zabczyk (1992). Because of the linear structure, the theory above is closely connected to (and in a sense inverse to) the theory of affine term structures developed in Duffie and Kan (1996). The standard reference on infinite dimensional SDEs is Da Prato and Zabczyk (1992), where one also can find a presentation of the connections between control theory and infinite dimensional linear stochastic equations.

3 Invariant manifolds In this section we study when a given submanifold of forward rate curves is invariant under the action of a given interest rate model. This problem is of interest from an applied as well as from a theoretical point of view. In particular we will use the results from this section to analyze problems about existence of finite dimensional factor realizations for interest rate models on forward rate form. Invariant manifolds are, however, also of interest in their own right, so we begin by discussing a concrete problem which naturally leads to the invariance concept.

3.1 Parameter recalibration A standard procedure when dealing with concrete interest rate models on a high frequency (say, daily) basis can be described as follows: 1. At time t = 0, use market data to fit (calibrate) the model to the observed bond prices. 2. Use the calibrated model to compute prices of various interest rate derivatives. 3. The following day (t = 1), repeat the procedure in 1 above in order to recalibrate the model, etc. To carry out the calibration in step 1 above, the analyst typically has to produce a forward rate curve {r o (0, x); x ≥ 0} from the observed data. However, since only a finite number of bonds actually trade in the market, the data consist of a discrete set of points, and a need to fit a curve to these points arises. This curve-fitting

7. A Geometric View of Interest Rate Theory

255

may be done in a variety of ways. One way is to use splines, but also a number of parameterized families of smooth forward rate curves have become popular in applications – the most well-known probably being the Nelson-Siegel (see Nelson and Siegel (1987)) family. Once the curve {r o (0, x); x ≥ 0} has been obtained, the parameters of the interest rate model may be calibrated to this. Now, from a purely logical point of view, the recalibration procedure in step 3 above is of course slightly nonsensical: if the interest rate model at hand is an exact picture of reality, then there should be no need to recalibrate. The reason that everyone insists on recalibrating is of course that any model in fact is only an approximate picture of the financial market under consideration, and recalibration allows the incorporation of newly arrived information in the approximation. Even so, the calibration procedure itself ought to take into account that it will be repeated. It appears that the optimal way to do so would involve a combination of time series and cross-section data, as opposed to the purely cross-sectional curve-fitting, where the information contained in previous curves is discarded in each recalibration. . The cross-sectional fitting of a forward curve and the repeated recalibration is thus, in a sense, a pragmatic and somewhat non-theoretical endeavor. Nonetheless, there are some nontrivial theoretical problems to be dealt with in this context, and the problem to be studied in this section concerns the consistency between, on the one hand, the dynamics of a given interest rate model, and, on the other hand, the forward curve family employed. What, then, is meant by consistency in this context? Assume that a given interest rate model M (e.g. the Hull–White model (1990)) in fact is an exact picture of the financial market. Now consider a particular family G of forward rate curves (e.g. the Nelson–Siegel family) and assume that the interest rate model is calibrated using this family. We then say that the pair (M, G) is consistent (or, that M and G are consistent) if all forward curves which may be produced by the interest rate model M are contained within the family G. Otherwise, the pair (M, G) is inconsistent. Thus, if M and G are consistent, then the interest rate model actually produces forward curves which belong to the relevant family. In contrast, if M and G are inconsistent, then the interest rate model will produce forward curves outside the family used in the calibration step, and this will force the analyst to change the model parameters all the time – not because the model is an approximation to reality, but simply because the family does not go well with the model. Put into more operational terms this can be rephrased as follows. • Suppose that you are using a fixed interest rate model M. If you want to do recalibration, then your family G of forward rate curves should be chosen in

256

T. Bj¨ork

such a way as to be consistent with the model M. Note however that the argument can also be run backwards, yielding the following conclusion for empirical work. • Suppose that a particular forward curve family G has been observed to provide a good fit, on a day-to-day basis, in a particular bond market. Then this gives you modeling information about the choice of an interest rate model in the sense that you should try to use/construct an interest rate model which is consistent with the family G. We now have a number of natural problems to study. I Given an interest rate model M and a family of forward curves G, what are necessary and sufficient conditions for consistency? II Take as given a specific family G of forward curves (e.g. the Nelson–Siegel family). Does there exist any interest rate model M which is consistent with G? III Take as given a specific interest rate model M (e.g. the Hull–White model). Does there exist any finitely parameterized family of forward curves G which is consistent with M? In this section we will mainly address problem I above. Problem II has been studied, for special cases, in Filipovi´c (1998a,b), whereas Problem III can be shown (see Proposition 4.6) to be equivalent to the problem of finding a finite dimensional factor realization of the model M and we provide a fairly complete solution in Section 4.

3.2 Invariant manifolds We now move on to give precise mathematical definition of the consistency property discussed above, and this leads us to the concept of an invariant manifold. Definition 3.1 (Invariant manifold) Take as given the forward rate process dynamics (2). Consider also a fixed family (manifold) of forward rate curves G. We say that G is locally invariant under the action of r if, for each point (s, r ) ∈ R+ × G, the condition rs ∈ G implies that rt ∈ G, on a time interval with positive length. If r stays forever on G, we say that G is globally invariant. The purpose of this section is to characterize invariance in terms of local characteristics of G and M, and in this context local invariance is the best one can hope for. In order to save space, local invariance will therefore be referred to as invariance.

7. A Geometric View of Interest Rate Theory

257

To get some intuitive feeling for the invariance concepts one can consider the following two-dimensional deterministic system dy1 dt dy2 dt

= y2 , = −y1 .

For this system it is obvious that the unit circle C = (y1 , y2 ) : y12 + y22 = 1 is globally invariant, i.e. if we start the on C. system on C it will stay forever The ‘upper half’ of the circle, Cu = (y1 , y2 ) : y12 + y22 = 1, y2 > 0 , is on the other hand only locally invariant, since the system will leave Cu at the point (1, 0). This geometric situation is in fact the generic one also for our infinite dimensional stochastic case. The forward rate trajectory will never leave a locally invariant manifold at a point in the relative interior of the manifold. Exit from the manifold can only take place at the relative boundary points. We have no general method for determining whether a locally invariant manifold is also globally invariant or not. Problems of this kind have to be solved separately for each particular case.

3.3 The formalized problem 3.3.1 The Space As our basic space of forward rate curves we will use a weighted Sobolev space, where a generic point will be denoted by r . Definition 3.2 Consider a fixed real number γ > 0. The space Hγ is defined as the space of all differentiable (in the distributional sense) functions r : R+ → R satisfying the norm condition -r -γ < ∞. Here the norm is defined as 2 ∞ ∞ dr 2 −γ x 2 -r -γ = r (x)e d x + (x) e−γ x d x. d x 0 0 Remark 3.3 The variable x is as before interpreted as time to maturity. With the inner product ∞ ∞ dq dr −ax (x) (x) e−γ x d x, (r, q) = r (x)q(x)e d x + d x d x 0 0 the space Hγ becomes a Hilbert space. Because of the exponential weighting function all constant forward rate curves will belong to the space. In the sequel we will suppress the subindex γ , writing H instead of Hγ .

258

T. Bj¨ork

3.3.2 The Forward Curve Manifold We consider as given a mapping G : Z → H,

(38)

where the parameter space Z is an open connected subset of R d , i.e. for each parameter value z ∈ Z ⊆ R d we have a curve G(z) ∈ H. The value of this curve at the point x ∈ R+ will be written as G(z, x), so we see that G can also be viewed as a mapping G : Z × R+ → R.

(39)

The mapping G is thus a formalization of the idea of a finitely parameterized family of forward rate curves, and we now define the forward curve manifold as the set of all forward rate curves produced by this family. Definition 3.4 The forward curve manifold G ⊆ H is defined as G = Im PA G. 3.3.3 The Interest Rate Model We take as given a volatility function σ of the form σ : H × R+ → R m , i.e. σ (r, x) is a functional of the infinite dimensional r -variable, and a function of the real variable x. Denoting the forward rate curve at time t by rt we then have the following forward rate equation.

x ∂ rt (x) + σ (rt , x) σ (rt , u) du dt + σ (rt , x)dWt . (40) drt (x) = ∂x 0 Remark 3.5 For notational simplicity we have assumed that the r -dynamics are time homogeneous. The case when σ is of the form σ (t, r, x) can be treated in exactly the same way. See Bj¨ork and Christensen (1999). We need some regularity assumptions, and the main ones are as follows. See Bj¨ork (1997) for technical details. Assumption 3.6 We assume the following. • The volatility mapping r −→ σ (r ) is smooth. • The mapping z −→ G(z) is a smooth embedding, so in particular the Fr´echet derivative G z (z) is injective for all z ∈ Z. • For every initial point r0 ∈ G, there exists a unique strong solution in H of Equation (40).

7. A Geometric View of Interest Rate Theory

259

3.3.4 The Problem Our main problem is the following. • Suppose that we are given – A volatility σ , specifying an interest rate model M as in (40) – A mapping G, specifying a forward curve manifold G. • Is G then invariant under the action of r ?

3.4 The invariance conditions In order to study the invariance problem we need to introduce some compact notation. Definition 3.7 We define Hσ by

Hσ (r, x) =

x

σ (r, s)ds.

0

Suppressing the x-variable, the Itˆo dynamics for the forward rates are thus given by

∂ drt = (41) rt + σ (rt )Hσ (rt ) dt + σ (rt )dWt ∂x and we write this more compactly as drt = µ0 (rt )dt + σ (rt )dWt ,

(42)

where the drift µ0 is given by the bracket term in (41). To get some intuition we now formally “divide by dt” and obtain dr = µ0 (rt ) + σ (rt )W˙ t , dt

(43)

where the formal time derivative W˙ t is interpreted as an “input signal” chosen by chance. As in Section 2.3 we are thus led to study the associated deterministic control system dr (44) = µ0 (rt ) + σ (rt )u t . dt The intuitive idea is now that G is invariant under (42) if and only if G is invariant under (44) for all choices of the input signal u. It is furthermore geometrically obvious that this happens if and only if the velocity vector µ(r ) + σ (r )u is tangential to G for all points r ∈ G and all choices of u ∈ R m . Since the tangent space of

260

T. Bj¨ork

G at a point G(z) is given by Im G z (z) , where G z denotes the Fr´echet derivative (Jacobian), we are led to conjecture that G is invariant if and only if the condition µ0 (r ) + σ (r )u ∈ Im G z (z) is satisfied for all u ∈ R m . This can also be written µ0 (r ) ∈ Im G z (z) , σ (r ) ∈ Im G z (z) , where the last inclusion is interpreted componentwise for σ . This “result” is, however, not correct due to the fact that the argument above neglects the difference between ordinary calculus, which is used for (44), and Itˆo calculus, which governs (42). In order to bridge this gap we have to rewrite the analysis in terms of Stratonovich integrals instead of Itˆo integrals. Definition 3.8 For given

t semimartingales X and Y , the Stratonovich integral of X with respect to Y , 0 X (s) ◦ dY (s), is defined as t t 1 X s ◦ dYs = X s dYs + .X, Y /t . (45) 2 0 0 The first term on the rhs is the Itˆo integral. In the present case, with only Wiener processes as driving noise, we can define the “quadratic variation process” .X, Y / in (45) by d.X, Y /t = d X t dYt ,

(46)

with the usual “multiplication rules” dW · dt = dt · dt = 0, dW · dW = dt. We now recall the main result and raison d’ˆetre for the Stratonovich integral. Proposition 3.9 (Chain rule) Assume that the function F(t, y) is smooth. Then we have ∂F ∂F (t, Yt )dt + ◦ dYt . (47) d F(t, Yt ) = ∂t ∂y Thus, in the Stratonovich calculus, the Itˆo formula takes the form of the standard chain rule of ordinary calculus. Returning to (42), the Stratonovich dynamics are given by

∂ 1 rt + σ (rt )Hσ (rt ) dt − d.σ (rt ), Wt / drt = ∂x 2 (48) + σ (r t ) ◦ dWt .

7. A Geometric View of Interest Rate Theory

261

In order to compute the Stratonovich correction term above we use the infinite dimensional Itˆo formula (see Da Prato and Zabczyk (1992)) to obtain dσ (rt ) = {· · ·} dt + σ r (rt )σ (rt )dWt ,

(49)

where σ r denotes the Fr´echet derivative of σ w.r.t. the infinite dimensional r variable. From this we immediately obtain d.σ (rt ), Wt / = σ r (rt )σ (rt )dt.

(50)

Remark 3.10 If the Wiener process W is multidimensional, then σ is a vector σ = [σ 1 , . . . , σ m ], and the rhs of (50) should be interpreted as σ r (rt )σ (rt , x) =

m

σ ir (rt )σ i (rt ).

i=1

Thus (48) becomes drt

∂ 1 rt + σ (rt )Hσ (rt ) − σ r (rt )σ (rt ) dt = ∂x 2 + σ (rt ) ◦ dWt

(51)

We now write (51) as drt = µ(rt )dt + σ (rt ) ◦ dWt , where µ(r, x) =

∂ r (x) + σ (rt , x) ∂x

x 0

σ (rt , u)- du −

1 σ r (rt )σ (rt ) (x). 2

(52)

(53)

Given the heuristics above, our main result is not surprising. The formal proof, which is somewhat technical, is left out. See Bj¨ork and Christensen (1999). Theorem 3.11 (Main theorem) The forward curve manifold G is locally invariant for the forward rate process r (t, x) in M if and only if, 1 G x (z) + σ (r ) Hσ (r )- − σ r (r ) σ (r ) ∈ Im[G z (z)] , 2 σ (r ) ∈ Im[G z (z)] ,

(54) (55)

hold for all z ∈ Z with r = G(z). Here, G z and G x denote the Fr´echet derivative of G with respect to z and x, respectively. The condition (55) is interpreted componentwise for σ . Condition (54) is called the consistent drift condition, and (55) is called the consistent volatility condition.

262

T. Bj¨ork

Remark 3.12 It is easily seen that if the family G is invariant under shifts in the x-variable, then we will automatically have the relation G x (z) ∈ Im[G z (z)], so in this case the relation (54) can be replaced by 1 σ (r )Hσ (r )- − σ r (r ) σ (r ) ∈ Im[G z (z)], 2 with r = G(z) as usual.

3.5 Examples The results above are extremely easy to apply in concrete situations. As a test case we consider the Nelson–Siegel (see Nelson and Siegel (1987)) family of forward rate curves. We analyze the consistency of this family with the Ho–Lee and Hull– White interest rate models. It should be emphasized that these examples are chosen only in order to illustrate the general methodology. For more examples and details, see Bj¨ork and Christensen (1999). 3.5.1 The Nelson–Siegel family The Nelson–Siegel (henceforth NS) forward curve manifold G is parameterized by z ∈ R 4 , the curve x −→ G(z, x) as G(z, x) = z 1 + z 2 e−z4 x + z 3 xe−z4 x . For z 4 = 0, the Fr´echet derivatives are easily obtained as G z (z, x) = 1, e−z4 x , xe−z4 x , −(z 2 + z 3 x)xe−z4 x , G x (z, x) = (z 3 − z 2 z 4 − z 3 z 4 x)e−z4 x .

(56)

(57) (58)

In order for the image of this map to be included in Hγ , we need to impose the condition z 4 > −γ /2. In this case, the natural parameter space is thus Z = z ∈ R 4 : z 4 = 0, z 4 > −γ /2 . However, as we shall see below, the results are uniform w.r.t. γ . Note that the mapping G indeed is smooth, and for z 4 = 0, G and G z are also injective. In the degenerate case z 4 = 0, we have G(z, x) = z 1 + z 2 + z 3 x, We return to this case below.

(59)

7. A Geometric View of Interest Rate Theory

263

3.5.2 The Hull–White and Ho–Lee models As our test case, we analyze the Hull and White (1990) (henceforth HW) extension of the Vasiˇcek model. On short rate form the model is given by d R(t) = {"(t) − a R(t)} dt + σ dW (t),

(60)

where a, σ > 0. As is well known, the corresponding forward rate formulation is dr (t, x) = β(t, x)dt + σ e−ax dWt .

(61)

Thus, the volatility function is given by σ (x) = σ e−ax , and the conditions of Theorem 3.11 become σ 2 −ax G x (z, x) + − e−2ax ∈ Im[G z (z, x)], (62) e a (63) σ e−ax ∈ Im[G z (z, x)]. To investigate whether the NS manifold is invariant under HW dynamics, we start with (63) and fix a z-vector. We then look for constants (possibly depending on z) A, B, C, and D, such that for all x ≥ 0 we have σ e −ax = A + Be−z4 x + C xe−z4 x − D(z 2 + z 3 x)xe−z4 x .

(64)

This is possible if and only if z 4 = a, and since (63) must hold for all choices of z ∈ Z we immediately see that HW is inconsistent with the full NS manifold (see also the Notes below). Proposition 3.13 (Nelson–Siegel and Hull–White) The Hull–White model is inconsistent with the NS family. We have thus obtained a negative result for the HW model. The NS manifold is “too small” for HW, in the sense that if the initial forward rate curve is on the manifold, then the HW dynamics will force the term structure off the manifold within an arbitrarily short period of time. For more positive results see Bj¨ork and Christensen (1999). Remark 3.14 It is an easy exercise to see that the minimal manifold which is consistent with HW is given by G(z, x) = z 1 e−ax + z 2 e−2ax . In the same way, one may easily test the consistency between NS and the model obtained by setting a = 0 in (60). This is the continuous time limit of the Ho and Lee model (Ho and Lee (1986)), and is henceforth referred to as HL. Since we have a pedagogical point to make, we give the results on consistency, which are as follows.

264

T. Bj¨ork

Proposition 3.15 (Nelson–Siegel and Ho–Lee) (a) The full NS family is inconsistent with the Ho–Lee model. (b) The degenerate family G(z, x) = z 1 + z 3 x is in fact consistent with Ho–Lee. Remark 3.16 We see that the minimal invariant manifold provides information about the model. From the result above, the HL model is closely tied to the class of affine forward rate curves. Such curves are unrealistic from an economic point of view, implying that the HL model is overly simplistic.

3.6 Notes The section is based on Bj¨ork and Christensen (1999). As we very easily detected above, neither the HW nor the HL model is consistent with the Nelson–Siegel family of forward rate curves. A much more difficult problem is to determine whether any interest rate model is. This is Problem II in Section 3.1 for the NS family, and it has been solved recently (using different techniques) in Filipovi´c (1998a), where it is shown that no nontrivial Wiener driven model is consistent with NS. Thus, for a model to be consistent with Nelson–Siegel, it must be deterministic. In Filipovi´c (1998b) (which is a technical tour de force) this result is extended to a much larger exponential polynomial family than the NS family. In our presentation we have used strong solutions of the infinite dimensional forward rate SDE. This is of course restrictive. The invariance problem for weak solutions has recently been studied in Filipovi´c (1999). An alternative way of studying invariance is by using some version of the Stroock–Varadhan support theorem, and this line of thought is carried out in depth in Zabczyk (1992).

4 Existence of nonlinear realizations We now turn to Problem 2 in Section 1.2, i.e. the problem of when a given forward rate model has a finite dimensional factor realization. For ease of exposition we mostly confine ourselves to a discussion of the case of a single driving Wiener process and to time invariant forward rate dynamics. Multidimensional Wiener processes and time varying systems can be treated similarly, and for completeness we state the results for the multidimensional case. We will use some ideas and concepts from differential geometry, and a general reference here is Warner (1979). The section is based on Bj¨ork and Svensson (1999).

7. A Geometric View of Interest Rate Theory

265

4.1 Setup In order to study the realization problem we need (see Remark 4.4) a very regular space to work in. Definition 4.1 Consider a fixed real number γ > 0. The space Bγ is defined as the space of all infinitely differentiable functions r : R+ → R satisfying the norm condition -r -γ < ∞. Here the norm is defined as 2 ∞ n ∞ d r 2 −n 2 (x) e−γ x d x. -r -γ = n d x 0 n=0 Note that B is not a space of distributions, but a space of functions. As with H we will often suppress the subindex γ . With the obvious inner product B is a pre-Hilbert space, and in Bj¨ork and Svensson (1999) the following result is proved. Proposition 4.2 The space B is a Hilbert space, i.e. it is complete. Furthermore, every function in the space is fact real analytic, and can thus be uniquely extended to a holomorphic function in the entire complex plane. We now take as given a volatility σ : B → B and consider the induced forward rate model (on Stratonovich form) dr t = µ(rt )dt + σ (rt ) ◦ dWt ,

(65)

where as before (see Section 3.4). ∂ 1 r + σ (r )Hσ (r )- − σ r (r )σ (r ). ∂x 2 We need some regularity assumptions. µ(r ) =

(66)

Assumption 4.3 We assume that σ is chosen such that the following hold. • The mapping σ is smooth. • The mapping 1 r −→ σ (r )Hσ (r )- − σ r (r )σ (r ) 2 is a smooth map from B to B. Remark 4.4 The reason for our choice of B as the underlying space is that the linear operator F = d/d x is bounded in this space. Together with the assumptions above, this implies that both µ and σ are smooth vector fields on B, thus ensuring

266

T. Bj¨ork

the existence of a strong local solution to the forward rate equation for every initial point r o ∈ B.

4.2 The geometric problem Given a specification of the volatility mapping σ , and an initial forward rate curve r o we now investigate when (and how) the corresponding forward rate process possesses a finite dimensional realization. We are thus looking for smooth d-dimensional vector fields a and b, an initial point z 0 ∈ R d , and a mapping G : R d → B such that r , locally in time, has the representation d Zt

= a(Z t )dt + b(Z t )dWt , Z 0 = z 0

r (t, x) = G(Z t , x).

(67) (68)

Remark 4.5 Let us clarify some points. Firstly, note that in principle it may well happen that, given a specification of σ , the r -model has a finite dimensional realization given a particular initial forward rate curve r o , while being infinite dimensional for all other initial forward rate curves in a neighborhood of r o . We say that such a model is a non-generic or accidental finite dimensional model. If, on the other hand, r has a finite dimensional realization for all initial points in a neighborhood of r o , then we say that the model is a generically finite dimensional model. In this text we are solely concerned with the generic problem. Secondly, let us emphasize that we are looking for local (in time) realizations. We can now connect the realization problem to our studies of invariant manifolds. Proposition 4.6 The forward rate process possesses a finite dimensional realization if and only if there exists an invariant finite dimensional submanifold G with r o ∈ G. Proof See Bj¨ork and Christensen (1999) for the full proof. The intuitive argument runs as follows. Suppose that there exists a finite dimensional invariant manifold G with r o ∈ G. Then G has a local coordinate system, and we may define the Z process as the local coordinate process for the r -process. On the other hand it is clear that if r has a finite dimensional realization as in (67)–(68), then every forward rate curve that will be produced by the model is of the form x −→ G(z, x) for some choice of z. Thus there exists a finite dimensional invariant submanifold G containing the initial forward rate curve r o , namely G = Im G. Using Theorem 3.11 we immediately obtain the following geometric characterization of the existence of a finite realization.

7. A Geometric View of Interest Rate Theory

267

Corollary 4.7 The forward rate process possesses a finite dimensional realization if and only if there exists a finite dimensional manifold G containing r o , such that, for each r ∈ G, the following conditions hold: µ(r ) ∈ TG (r ), σ (r ) ∈ TG (r ). Here TG (r ) denotes the tangent space to G at the point r , and the vector fields µ and σ are as above. 4.3 The main result Given the volatility vector field σ , and hence also the field µ, we now are faced with the problem of determining whether there exists a finite dimensional manifold G with the property that µ and σ are tangential to G at each point of G. In the case when the underlying space is finite dimensional, this is a standard problem in differential geometry, and we will now give the heuristics. To get some intuition we start with a simpler problem and therefore consider the space B (or any other Hilbert space), and a smooth vector field f on the space. For each fixed point r o ∈ B we now ask whether there exists a finite dimensional manifold G with r o ∈ G such that f is tangential to G at every point. The answer to this question is yes, and the manifold can in fact be chosen to be one-dimensional. To see this, consider the infinite dimensional ODE drt = f (rt ), dt r0 = r o .

(69) (70)

If rt is the solution, at time t, of this ODE, we use the notation rt = e f t r o . ft : t ∈ R , and we note that the set We have thus defined a group of operators e ft o e r : t ∈ R ⊆ B is nothing else than the integral curve of the vector field f , passing through r o . If we define G as this integral curve, then our problem is solved, since f will be tangential to G by construction. Let us now take two vector fields f 1 and f 2 as given, where the reader informally can think of f 1 as σ and f 2 as µ. We also fix an initial point r o ∈ B and the question is if there exists a finite dimensional manifold G, containing r o , with the property that f 1 and f 2 are both tangential to G at each point of G. We call such a manifold a tangential manifold for the vector fields. At a first glance it would seem that there always exists a tangential manifold, and that it can even be chosen to be two-dimensional. The geometric idea is that we start at r o and let f 1 generate the

268

T. Bj¨ork

integral curve e f1 s r o : s ≥ 0 . For each point e f1 s r o on this curve we now let f 2 generate the integral curve starting at that point. This gives us the object e f2 t e f1 s r o and thus it seems that we sweep out a two-dimensional surface G in B. This is our obvious candidate for a tangential manifold. In the general case this idea will, however, not work, and the basic problem is as follows. In the construction above we started with the integral curve generated by f 1 and then applied f 2 , and there is of course no guarantee that we will obtain the same surface if we start with f2 and then apply f 1 . We thus have some sort of commutativity problem, and the key concept is the Lie bracket. Definition 4.8 Given smooth vector fields f and g on B, the Lie bracket [ f, g] is a new vector field defined by [ f, g] (r ) = f (r )g(r ) − g (r ) f (r ).

(71)

The Lie bracket measures the lack of commutativity on the infinitesimal scale in our geometric program above, and for the procedure to work we need a condition which says that the lack of commutativity is “small”. It turns out that the relevant condition is that the Lie bracket should be in the linear hull of the vector fields. Definition 4.9 Let f 1 , . . . , f n be smooth independent vector fields on some space X . Such a system is called a distribution, and the distribution is said to be involutive if f i , f j (x) ∈ span { f 1 (x), . . . , f n (x)} , ∀i, j, where the span is the linear hull over the real numbers. We now have the following basic result, which extends a classic result from finite dimensional differential geometry (see Warner (1979)). Theorem 4.10 (Frobenius) Let f 1 , . . . , f k be independent smooth vector fields in B and consider a fixed point r o ∈ B. Then the following statements are equivalent. • For each point r in a neighborhood of r o , there exists a k-dimensional tangential manifold passing through r . • The system f 1 , . . . , f k of vector fields is (locally) involutive. Proof See Bj¨ork and Svensson (1999), which provides a self contained proof of the Frobenius theorem in Banach space. Let us now go back to our interest rate model. We are thus given the vector fields µ, σ , and an initial point r o , and the problem is whether there exists a finite dimensional tangential manifold containing r o . Using the infinite dimensional

7. A Geometric View of Interest Rate Theory

269

Frobenius theorem, this situation is now easily analyzed. If {µ, σ } is involutive then there exists a two-dimensional tangential manifold. If {µ, σ } is not involutive, this means that the Lie bracket [µ, σ ] is not in the linear span of µ and σ , so then we consider the system {µ, σ , [µ, σ ]}. If this system is involutive there exists a three-dimensional tangential manifold. If it is not involutive at least one of the brackets [µ, [µ, σ ]], [σ , [µ, σ ]] is not in the span of {µ, σ , [µ, σ ]}, and we then adjoin this (these) bracket(s). We continue in this way, forming brackets of brackets, and adjoining these to the linear hull of the previously obtained vector fields, until the point when the system of vector fields thus obtained actually is closed under the Lie bracket operation. Definition 4.11 Take the vector fields f 1 , . . . , f k as given. The Lie algebra generated by f 1 , . . . , f k is the smallest linear space (over R) of vector fields which contains f 1 , . . . , f k and is closed under the Lie bracket. This Lie algebra is denoted by L = { f 1 , . . . , f k }LA The dimension of L is defined, for each point r ∈ B, as dim [L(r )] = dim span { f1 (r ), . . . , f k (r )} . Putting all these results together, we have the following main result on finite dimensional realizations. Theorem 4.12 (Main result) Take the volatility mapping σ = (σ 1 , . . . , σ m ) as given. Then the forward rate model generated by σ generically admits a finite dimensional realization if and only if dim {µ, σ 1 , . . . , σ m }LA < ∞ in a neighborhood of r o . The result above thus provides a general solution to Problem II from Section 1.2. For any given specification of forward rate volatilities, the Lie algebra can in principle be computed, and the dimension can be checked. Note, however, that the theorem is a pure existence result. If, for example, the Lie algebra has dimension five, then we know that there exists a five-dimensional realization, but the theorem does not directly tell us how to construct a concrete realization. This is the subject of ongoing research. Note also that realizations are not unique, since any diffeomorphic mapping of the factor space R d onto itself will give a new equivalent realization. When computing the Lie algebra generated by µ and σ , the following observations are often useful.

270

T. Bj¨ork

Lemma 4.13 Take the vector fields f 1 , . . . , f k as given. The Lie algebra L = { f 1 , . . . , f k }LA remains unchanged under the following operations. • The vector field f i (r ) may be replaced by α(r ) f i (r ), where α is any smooth nonzero scalar field. • The vector field f i (r ) may be replaced by f i (r ) + α j (r ) f j (r ), j=i

where α j is any smooth scalar field. Proof The first point is geometrically obvious, since multiplication by a scalar field will only change the length of the vector field f i , and not its direction, and thus not the tangential manifold. Formally it follows from the “Leibnitz rule” [ f, αg] = α [ f, g] − (α f )g. The second point follows from the bilinear property of the Lie bracket together with the fact that [ f, f ] = 0.

4.4 Applications In this section we give some simple applications of the theory developed above. For more examples and results, see Bj¨ork and Svensson (1999). 4.4.1 Constant Volatility We start with the simplest case, which is when the volatility σ (r, x) is a constant vector in B. We are thus back in the framework of Section 2, and we assume for simplicity that we have only one driving Wiener process. Then we have no Stratonovich correction term and the vector fields are given by x σ (s)ds, µ(r, x) = Fr (x) + σ (x) σ (r, x) = σ (x).

0

where as before F = ∂∂x . The Fr´echet derivatives are trivial in this case. Since F is linear (and bounded in our space), and σ is constant as a function of r , we obtain µr

= F,

σ r

= 0.

Thus the Lie bracket [µ, σ ] is given by [µ, σ ] = Fσ ,

7. A Geometric View of Interest Rate Theory

271

and in the same way we have [µ, [µ, σ ]] = F2 σ . Continuing in the same manner it is easily seen that the relevant Lie algebra L is given by L = {µ, σ }LA = span µ, σ , Fσ , F2 σ , . . . = span µ, Fn σ ; n = 0, 1, 2, . . . . It is thus clear that L is finite dimensional (at each point r ) if and only if the function space span Fn σ ; n = 0, 1, 2, . . . is finite dimensional. We have thus obtained our old condition from Proposition 2.12 and we have the following result which extends Proposition 2.4 by in principle allowing the realization to be nonlinear. Proposition 4.14 Under the above assumptions, there exists a finite dimensional realization if and only if σ is a quasi-exponential function. 4.4.2 Constant Direction Volatility We go on to study the most natural extension of the deterministic volatility case (still in the case of a scalar Wiener process), namely the case when the volatility is of the form σ (r, x) = ϕ(r )λ(x).

(72)

In this case the individual vector field σ has the constant direction λ ∈ H, but is of varying length, determined by ϕ, where ϕ is allowed to be any smooth functional of the entire forward rate curve. In order to avoid trivialities we make the following assumption. Assumption 4.15 We assume that ϕ(r ) = 0 for all r ∈ H. After a simple calculation the drift vector µ turns out to be 1 µ(r ) = Fr + ϕ 2 (r )D − ϕ (r )[λ]ϕ(r )λ, 2

(73)

where ϕ (r )[λ] denotes the Fr´echet derivative ϕ (r ) acting on the vector λ, and where the constant vector D ∈ H is given by x D(x) = λ(x) λ(s)ds. 0

272

T. Bj¨ork

We now want to know under what conditions on ϕ and λ we have a finite dimensional realization, i.e. when the Lie algebra generated by 1 µ(r ) = Fr + ϕ 2 (r )D − ϕ (r )[λ]ϕ(r )λ, 2 σ (r ) = ϕ(r )λ, is finite dimensional. Under Assumption 4.15 we can use Lemma 4.13, to see that the Lie algebra is in fact generated by the simpler system of vector fields f 0 (r ) = Fr + "(r )D, f 1 (r ) = λ, where we have used the notation "(r ) = ϕ 2 (r ). Since the field f 1 is constant, it has zero Fr´echet derivative. Thus the first Lie bracket is easily computed as [ f 0 , f 1 ] (r ) = Fλ + " (r )[λ]D. The next bracket to compute is [[ f 0 , f 1 ] , f 1 ] which is given by [[ f 0 , f 1 ] , f 1 ] = " (r )[λ; λ]D. Note that " (r )[λ; λ] is the second order Fr´echet derivative of " operating on the vector pair [λ; λ]. This pair is to be distinguished (notice the semicolon) from the Lie bracket [λ, λ] (with a comma), which if course would be equal to zero. We now make a further assumption. Assumption 4.16 We assume that " (r )[λ; λ] = 0 for all r ∈ H. Given this assumption we may again use Lemma 4.13 to see that the Lie algebra is generated by the following vector fields f 0 (r ) = Fr, f 1 (r ) = λ, f 3 (r ) = Fλ, f 4 (r ) = D. Of these vector fields, all but f 0 are constant, so all brackets are easy. After elementary calculations we see that in fact {µ, σ }LA = span Fr, Fn λ, Fn D; n = 0, 1, . . . .

7. A Geometric View of Interest Rate Theory

273

From this expression it follows immediately that a necessary condition for the Lie algebra to be finite dimensional is that the vector space spanned by {Fn λ; n ≥ 0} is finite dimensional. This occurs if and only if λ is quasi-exponential (see Remark 2.5). If, on the other hand, λ is quasi-exponential, then we know from Lemma 2.6, that D is also quasi-exponential, since it is the integral of the QE function λ multiplied by the QE function λ. Thus the space {Fn D; n = 0, 1, . . .} is also finite dimensional, and we have proved the following result. Proposition 4.17 Under Assumptions 4.15 and 4.16, the interest rate model with volatility given by σ (r, x) = ϕ(r )λ(x) has a finite dimensional realization if and only if λ is a quasi-exponential function. The scalar field ϕ is allowed to be any smooth field. 4.4.3 When is the Short Rate a Markov Process? One of the classical problems concerning the HJM approach to interest rate modeling is that of determining when a given forward rate model is realized by a short rate model, i.e. when the short rate is Markovian. We now briefly indicate how the theory developed above can be used in order to analyze this question. For the full theory see Bj¨ork and Svensson (1999). Using the results above, we immediately have the following general necessary condition. Proposition 4.18 The forward rate model generated by σ is a generic short rate model, i.e. the short rate is generically a Markov process, only if dim {µ, σ }LA ≤ 2.

(74)

Proof If the model is really a short rate model, then bond prices are given as p(t, x) = F(t, Rt , x) where F solves the term structure PDE. Thus bond prices, and forward rates are generated by a two-dimensional factor model with time t and the short rate R as the state variables. Remark 4.19 The most natural case is dim {µ, σ }LA = 2. It is an open problem whether there exists a non-deterministic generic short rate model with dim {µ, σ }LA = 1. Note that condition (74) is only a necessary condition for the existence of a short rate realization. It guarantees that there exists a two-dimensional realization, but the question remains whether the realization can be chosen in such a way that the short rate and running time are the state variables. This question is completely resolved by the following central result.

274

T. Bj¨ork

Theorem 4.20 Assume that the model is not deterministic, and take as given a time invariant volatility σ (r, x). Then there exists a short rate realization if and only if the vector fields [µ, σ ] and σ are parallel, i.e. if and only if there exists a scalar field α(r ) such that the following relation holds (locally) for all r . [µ, σ ] (r ) = α(r )σ (r ).

(75)

Proof See Bj¨ork and Svensson (1999). It turns out that the class of generic short rate models is very small indeed. We have, in fact, the following result, which was first proved in Jeffrey (1995) (using techniques different from those above). See Bj¨ork and Svensson (1999) for a proof based on Theorem 4.20. Theorem 4.21 Consider an HJM model with one driving Wiener process and a volatility structure of the form σ (r, x) = g(R, x). where R = r (0) is the short rate. Then the model is a generic short rate model if and only if g has one of the following forms. • There exists a constant c such that g(R, x) ≡ c. • There exist constants a and c such that. g(R, x) = ce−ax . • There exist constants a and b, and a function α(x), where α satisfies a certain Riccati equation, such that √ g(R, x) = α(x) a R + b. We immediately recognize these cases as the Ho–Lee model, the Hull–White extended Vasiˇcek model, and the Hull–White extended Cox–Ingersoll–Ross model (Cox, Ingersoll and Ross (1985)). Thus, in this sense the only generic short rate models are the affine ones, and the moral of this, perhaps somewhat surprising, result is that most short rate models considered in the literature are not generic but “accidental”. To understand the geometric picture one can think of the following program. 1. Choose an arbitrary short rate model, say of the form d Rt = a(Rt )dt + b(Rt )dWt with a fixed initial point R0 .

7. A Geometric View of Interest Rate Theory

275

2. Solve the associated PDE in order to compute bond prices. This will also produce: • An initial forward rate curve rˆ o (x). • Forward rate volatilities of the form g(R, x). 3. Forget about the underlying short rate model, and take the forward rate volatility structure g(R, x) as given in the forward rate equation. 4. Initiate the forward rate equation with an arbitrary initial forward rate curve r o (x). The question is now whether the thus constructed forward rate model will produce a Markovian short rate process. Obviously, if you choose the initial forward rate curve r o as r o = rˆ o , then you are back where you started, and everything is OK. If, however, you choose another initial forward rate curve rather than rˆ o , say the observed forward rate curve of today, then it is no longer clear that the short rate will be Markovian. What the theorem above says is that only the models listed above will produce a Markovian short rate model for all initial points in a neighborhood of rˆ o . If you take another model (like, say, the Dothan model) then a generic choice of the initial forward rate curve will produce a short rate process which is not Markovian.

4.5 Notes The section is based on Bj¨ork and Svensson (1999) where full proofs and further results can be found, and where also the time varying case is considered. In our study of the constant direction model above, ϕ was allowed to be any smooth functional of the entire forward rate curve. The simpler special case when ϕ is a point evaluation of the short rate, i.e. of the form ϕ(r ) = h(r (0)) has been studied in Bhar and Chiarella (1997), Inui and Kijima (1998) and Ritchken and Sankarasubramanian (1995). All these cases falls within our present framework and the results are included as special cases of the general theory above. A different case, treated in Chiarella and Kwon (1998), occurs when σ is a finite point evaluation, i.e. when σ (t, r ) = h(t, r (x 1 ), . . . r (xk )) for fixed benchmark maturities x 1 , . . . , xk . In Chiarella and Kwon (1998) it is studied when the corresponding finite set of benchmark forward rates is Markovian. A classic paper on Markovian short rates is Carverhill (1994), where a deterministic volatility of the form σ (t, x) is considered. Theorem 4.21 was first stated and proved in Jeffrey (1995). See Eberlein and Raible (1999) for an example with a driving L´evy process. The geometric ideas presented above and in Bj¨ork and Svensson (1999) are intimately connected to controllability problems in systems theory, where they

276

T. Bj¨ork

have been used extensively (see Isidori (1989)). They have also been used in filtering theory, where the problem is to find a finite dimensional realization of the unnormalized conditional density process, the evolution of which is given by the Zakai equation. See Brockett (1981) for an overview of these areas.

References Bhar, R. and Chiarella, C. (1997), Transformation of Heath–Jarrow–Morton models to markovian systems. European Journal of Finance 3, 1, 1–26. Bj¨ork, T. (1997), Interest Rate Theory. In W. Runggaldier (ed.), Financial Mathematics. Springer Lecture Notes in Mathematics, Vol. 1656. Springer-Verlag, Berlin. Bj¨ork, T. and Christensen, B.J. (1999), Interest rate dynamics and consistent forward rate curves. Mathematical Finance 9, 4, 323–48. Bj¨ork, T. and Gombani, A. (1999), Minimal realization of interest rate models. Finance and Stochastics 3, 4, 413–32. Bj¨ork, T. and Svensson, L. (1999), On the existence of finite dimensional nonlinear realizations of interest rate models. Forthcoming in Mathematical Finance. Brace, A. and Musiela, M. (1994), A multi factor Gauss Markov implementation of Heath Jarrow and Morton. Mathematical Finance 4, 3, 563–76. Brockett, R.W. (1970), Finite Dimensional Linear Systems. Wiley, New York. Brockett, R.W. (1981), Nonlinear systems and nonlinear estimation theory. In Stochastic systems: The Mathematics of Filtering and Identification and Applications (eds. Hazewinkel, M and Willems, J.C.) Reidel, Dordrecht. Carverhill, A. (1994), When is the spot rate Markovian? Mathematical Finance, 4, 305–12. Chiarella, C and Kwon, K. (1998), Forward rate dependent Markovian transformations of the Heath–Jarrow–Morton term structure model. Working paper. School of Finance and Economics, University of Technology, Sydney. Cox, J., Ingersoll, J. and Ross, S. (1985), A theory of the term structure of interest rates. Econometrica 53, 385–408. Da Prato, G. and Zabczyk, J. (1992), Stochastic Equations in Infinite Dimensions. Cambridge University Press, Cambridge. Duffie, D. and Kan, R. (1996), A yield factor model of interest rates. Mathematical Finance, 6, 379–406. Eberlein, E. and Raible, S. (1999), Term structure models driven by general L´evy processes. Mathematical Finance 9, 31–53. El Karoui, N. and Lacoste, V (1993), Multifactor models of the term structure of interest rates. Preprint. El Karoui, N., Geman, H. and Lacoste, V (1997), On the role of state variables in interest rate models. Preprint Filipovi´c, D. (1998a): A note on the Nelson–Siegel family. Mathematical Finance 9, 4, 349–59. Filipovi´c, D. (1998b): Exponential–polynomial families and the term structure of interest rates. To appear in Bernoulli. Filipovi´c, D. (1999), Invariant manifolds for weak solutions of stochastic equations. To appear in Probability Theory and Related Fields. Heath, D., Jarrow, R. and Morton, A. (1992), Bond pricing and the term structure of interest rates. Econometrica 60 1, 77–106.

7. A Geometric View of Interest Rate Theory

277

Ho, T. and Lee, S. (1986), Term structure movements and pricing interest rate contingent claims. Journal of Finance 41, 1011–29. Hull, J. and White, A. (1990), Pricing interest-rate-derivative securities. The Review of Financial Studies 3, 573–92. Inui, K. and Kijima, M. (1998), A markovian framework in multi-factor Heath–Jarrow–Morton models. JFQA 333 3, 423–40. Isidori, A. (1989), Nonlinear Control Systems. Springer-Verlag, Berlin. Jeffrey, A. (1995), Single factor Heath–Jarrow–Morton term structure models based on Markovian spot interest rates. JFQA 30 4, 619–42. Musiela, M. (1993), Stochastic PDEs and term structure models. Preprint. Musiela, M. and Rutkowski, M. (1997), Martingale Methods in Financial Modeling. Springer-Verlag, Berlin, Heidelberg, New York. Nelson, C. and Siegel, A. (1987), Parsimonious modelling of yield curves. Journal of Business, 60, 473–89. Ritchken, P. and Sankarasubramanian, L. (1995), Volatility structures of forward rates and the dynamics of the term structure. mathematical Finance, 5, 1, 55–72. Vasi˘cek, O. (1977), An equilibrium characterization of the term structure. Journal of Financial Economics 5, 177–88. Warner, F.W. (1979), Foundations of Differentiable Manifolds and Lie Groups. Scott, Foresman, Hill. Zabczyk, J. (1992), Stochastic invariance and conistency of financial models. Preprint. Scuola Normale Superiore, Pisa.

8 Towards a Central Interest Rate Model Alan Brace, Tim Dun and Geoff Barton

1 Introduction In recent years, the appearance of a new class of term structure of interest rate models has attracted the interest of practitioners. These so-called Market Models provide both an arbitrage-free pricing framework and pricing formulae that conform to the current (and accepted) market practice. This class of model can effectively be split into two types: those that model forward Libor rates, and those that model forward swap rates. The Libor rate models, such as those introduced in Miltersen et al. (1997), Brace et al. (1997) and Musiela and Rutkowski (1997a,b), allow caps to be priced in a manner consistent with market practice, while the swap rate models, such as the one proposed by Jamshidian (1997), do the same for swaptions. However, these two approaches are fundamentally incompatible because Libor rates and swap rates cannot both be lognormal in an arbitrage-free framework. The formulae currently in use in the market are based on extensions of the wellknown Black–Scholes option formula, and are, in fact, known as the Black cap and swaption formulae. In the case of swaptions, the swap rate replaces the stock price as being the market observable parameter assumed to follow lognormal dynamics. Other concepts that are related to (and easily calculated using) the Black–Scholes option formula can also be extended to the case of swaptions, such as the option sensitivities or Greeks. These give an indication as to the likely magnitude and direction of the change in option price under changes in the swap rate value and/or volatility. The Black formulae, however, are incapable of producing arbitrage-free prices for exotics, nor are they of much use as a ‘central’ interest rate model to do bankwide risk management. These shortfalls constitute the original motivation for the development of term structure models. So how do the two types of Market Model mentioned above perform in these areas? 278

8. Towards a Central Interest Rate Model

279

When pricing exotics, the natural tendency is to choose the most appropriate model for the task, hence Libor models for Libor based exotics, such as barrier caps triggered by Libor, and swap rate models for swap rate based exotics, such as barrier swaptions triggered by the swap rate. The case of cross-market exotics, however, is not so simple – how does one treat barrier swaptions triggered by Libor, and how does one calibrate simultaneously to both cap and swaption markets? In the authors’ opinion, the Libor model is the unifying model – the Central Interest Rate Model – capable of encompassing the global properties of the swap rate model and tackling the problems related above. This is primarily because it is the most tractable mathematically, with Libor rates being lognormal under their own measures, without the restriction of only certain families of swap rates being lognormal. The model also prices swaptions and swap rate exotics, and, as we intend to argue in this paper, in practice it prices swaptions in a manner close to that of the market – and by extension – to the forward swap rate model. This indicates a closeness between the two types of Market Model.1 We propose in this study, therefore, to examine the Libor model and its ability to price and hedge pure swap market products in comparison to the Black swaption formula, under arbitrary yield and volatility specifications, with the aim of revealing the closeness of the two approaches. Our methodology is as follows. First, in Section 2, the notation and equations involved in swaption pricing within the Libor model are introduced. The Black swaption formula is also presented, along with the equations necessary to calculate the swaption Greeks and hedge swaptions. In Section 3, the actual distributional properties of the swap rate within the Libor model are examined analytically, to see if it cannot be approximately modelled by a lognormal process. An expression is then derived for the volatility of this swap rate allowing the approximate pricing of swaptions inside the Libor model using a Black type formula. In Section 4, approximation techniques are applied to derive equations inside the Libor model for swaption Greeks with respect to the swap rate. Here, only approximate relations at best may be expected, since in the Libor model, the swap rate is a weighted sum of Libor rates, and not a single quantity as implied by the Black formula. These Greeks will, however, provide us with another mechanism for comparing the swaption modelling capabilities of the Libor model. Simulation techniques are then used to test the approximations from Sections 3 and 4 on a range of swaptions for two quite different volatility structures, with the results presented in Section 5. Tests are carried out to determine if the swaption Greeks derived are meaningful by undertaking a delta-hedging simulation and seeing if Libor model swaptions can be 1 This closeness was first alluded to in the observation in Brace et al. (1997) that the Libor model swaption

formula essentially reduces to the Black formula when yield and volatility are flat. Other authors to examine this behaviour include Jamshidian (1997) and Rebonato (1999).

280

A. Brace, T. Dun and G. Barton

successfully hedged within the Libor model framework using Black-style hedging techniques. The results from these tests are also presented in Section 5. Finally, Section 6 states our conclusions on the work done, while the appendices contain additional results, both numerical and mathematical, for the interested reader.

2 Model preliminaries In this section, we introduce the fundamental equations behind the lognormal Libor model, together with swap and swaption pricing within this model. The equivalent market pricing equations are then presented, and option sensitivities (or Greeks) defined. The section ends with a description of a method for translating the Greeks into actual hedges. Note that all the definitions, results and formulae in this section hold for both single and multi-factor models.

2.1 Lognormal Libor model We consider the discrete tenor version of the lognormal forward Libor model, as described in Musiela and Rutkowski (1997a,b), and Jamshidian (1997), as opposed to the continuous tenor model in Brace et al. (1997). We start with an equi-spaced tenor structure defined by T j = T0 + jδ for j = 1, . . . , n where δ is a constant typically of value three or six months. Time t values of zero coupon bonds expiring on the tenor dates are expressed as P(t, T j ), while the forward time T price for a zero coupon bond maturing at T j ≥ T is FT (t, T j ) =

P(t, T j ) . P(t, T )

The forward Libor rate K (t, T j ), expressing the simple forward interest rate between tenor dates T j and T j+1 , is related to the zero coupon bonds by P(t, T j ) 1 −1 . K (t, T j ) = δ P(t, T j+1 ) We assume that we are equipped with a complete filtered probability space (, F, P) satisfying the ‘usual conditions’ (see Chapter 14 in Musiela and Rutkowski (1997a)). The dynamics of the forward Libor processes are then described by the stochastic differential equation d K (t, T j−1 ) = K (t, T j−1 )γ (t, T j−1 ) · dWT j (t)

(1)

8. Towards a Central Interest Rate Model

281

where γ (t, T j−1 ) is the forward Libor volatility function, and WT j represents Brownian motion under the P-equivalent forward measure PT j . Adjacent forward measures are related by d WT j (t) = dWT j−1 (t) +

δ K (t, T j−1 ) γ (t, T j−1 )dt. 1 + δ K (t, T j−1 )

(2)

Consider now a forward payer swap, paid in arrears, with n equal rolls starting at time T0 . In terms of zero coupon bonds, Libor rates and a strike value κ, the time t value of the swap Pswap(t) can be written as Pswap(t) = Pswap(t, T0 , n) = δ

n

P(t, T j ) K (t, T j−1 ) − κ .

(3)

j=1

The swap rate ω(t) is that unique value of the strike which gives the swap contract zero value, and is given by n n j=1 P(t, T j )K (t, T j−1 ) j=1 FT0 (t, T j )K (t, T j−1 ) n n = . ω(t) = ω(t, T0 , n) = j=1 P(t, T j ) j=1 FT0 (t, T j ) (4) A swaption is formally defined as an option maturing at time T0 , on an underlying swap with strike κ. If the swap rate is greater than the strike at option maturity, then the swaption pays the difference between the two rates. The swaption price can, therefore, be expressed as Pswpn(t) = δ

n

P(t, T j )ET j

K (T, T j−1 ) − κ I(A) Ft

(5)

j=1

where A = {Swap(T ) ≥ 0} is the event that the swap ends up in-the-money. This expression does not allow an analytic solution, however a good approximation can be found following the approach in Brace et al. (1997) or Brace (1996). This approximation was originally derived for the continuous tenor version of the model, however it is equally valid in the discrete tenor model as no dates outside of the discrete tenor structure appear in the formulae. Define the n-dimensional random vector T0 de f γ (s, T j−1 ) · dWT j (s) X = (X j ) = t

and approximate it by a Gaussian random vector by using a deterministic approximation (here a Wiener chaos expansion of order 0) to the stochastic drift term in (2). The mean vector µ and covariance matrix λ of our approximation under the PT0 -measure are then given by X

∼ N(µ, λ),

282

A. Brace, T. Dun and G. Barton

µ = (µ j ) =

# j

i=1 T0

λ = (λi j ) =

$ δ K (t, Ti−1 ) λi j , 1 + δ K (t, Ti−1 )

γ (s, Ti−1 ) · γ (s, T j−1 )ds ,

(6)

t

where N(·) represents the multi-dimensional Gaussian cumulative distribution function. We find in practice that the symmetric matrix λ (which we will term the swaption covariance matrix) is often of rank one, meaning that it can be expressed as the cross product of a vector with itself, as in λ = × T . Such a decomposition can be easily found through an eigenvector/eigenvalue analysis of the matrix. Using this rank one approximation , we find the value of s satisfying the relation n j=1

K (t, T j−1 ) exp( j (s + d j ) − 12 2j ) − κ =0 1j 1 2 ) exp( (s + d ) − ) 1 + δ K (t, T j−1 j j i=1 2 j

(7)

with dj =

j i=1

δ K (t, Ti−1 ) i , 1 + δ K (t, Ti−1 )

and the approximate swaption price is then given by Pswpn(t) ≈ δ

n

P(t, T j ) K (t, T j−1 )N(h j ) − κN(h j − j )

(8)

j=1

where h j = −(s + d j − j ).

(9)

Equation (8) provides an accurate approximation as long as the assumption holds that the covariance matrix λ is of rank one. This assumption and its implications are discussed in more detail in Sections 4.1, 5.3 and 5.5.

2.2 Market swaption formula In the Market (or Black) swaption pricing formula, swap rates are implicitly assumed lognormal under a single measure Pm . For a swap of n rolls, maturing at time T0 , this implies the following relation between the forward swap rate ω(t) = ω(t, T0 , n) and its associated volatility σ (t) = σ (t, T0 , n): dω(t) = ω(t)σ (t) · dW (t),

8. Towards a Central Interest Rate Model

283

where W (t) is Brownian motion under Pm . In terms of ω(t), the present values of a payer swap and corresponding payer swaption are Pswap(t) = Pswap(t, T0 , n) = δ

n

P(t, T j ) (ω(t) − κ),

j=1

Pswpn(t) = Pswpn(t, T0 , n) = δ

n

P(t, T j )E (ω (T0 ) − κ)+ Ft

j=1

= δ

n

P(t, T j )B(t),

(10)

j=1

where B(t) is Black’s call formula B(t) = ω(t)N (h) − κ N h − ζ ,

(11)

in this case with + 1ζ ln ω(t) κ h = √ 2 , ζ T0 |σ (s, T0 , n)|2 ds. ζ =

(12)

t

We denote the term ζ as the swaption zeta, representing a volatility term which also contains information on the time to maturity of the option. We will use it below to define a version of the option vega. For the sake of convenience, we denote the sum n j=1 δ P(t, T j ) as the present value of a basis point, or PVBP. In other references this sum has been given various other names, including the coupon process, the level, or even the annuity price. The definition of sensitivities (or Greeks) for swaptions differs slightly from standard Black–Scholes type options due to the presence of the PVBP term and the fact that the swap rate is a forward rather than a spot value. We define, therefore, our Greeks in terms of forward values into the swaption discounted by the PVBP – this being a sensible definition in terms of hedging – as will be discussed in Section 2.3. This reduces the expressions for the Greeks to partial derivatives of the Black term B(t), as in # $ ∂ Pswpn(t) ∂B n Swaption delta = = N (h), (13) = ∂ω δ j=1 P(t, T j ) ∂ω # $ Pswpn(t) 1 ∂ 2B ∂2 = √ N (h), (14) = Swaption gamma = n 2 2 ∂ω δ j=1 P(t, T j ) ∂ω ω ζ

284

A. Brace, T. Dun and G. Barton

and Swaption vega

∂ = ∂ζ

#

Pswpn(t) n δ j=1 P(t, T j )

$ =

∂B ω = √ N (h), ∂ζ 2 ζ

(15)

where, as indicated above, we define our vega term slightly differently from the traditional way in that it is the derivative with respect to the swaption zeta, rather than an annualised volatility value as in Black–Scholes. This is done simply to ease computation later. Note that N (·) represents the Gaussian density function. Note also that our gamma and vega are connected by the relation 1 2 (16) ω , 2 and we would expect our approximate formulae for and in the lognormal Libor model (derived in Section 4) to satisfy this same constraint. =

2.3 Swaption hedging For Black–Scholes type options, the option not only describes the first-order sensitivity of the option value to the underlying, but it also represents the probability of exercise of the option and hence can be used for hedging – giving the required hedge ratio into the underlying. The extension of this to the case of swaptions is complicated by the presence of the PVBP discount term in the pricing formula (10), and the fact that the swap rate is not a traded asset. One method2 is to hedge using the underlying forward swap and the PVBP as the hedging instruments. The hedge then consists of two elements • a delta hedge of amount = N (h) (from Section 2.2) into the underlying forward swap Pswap(t), and • a bucket hedge of (B(t) − (ω(t) − κ)) into the PVBP. This produces a portfolio which matches the swaption in value, and – with continual rebalancing – should match the swaption payoff at maturity. Often in practice the swaption is delta-hedged with the underlying swap while the PVBP terms are absorbed into the underlying book as cash flows, where they are hedged as part of the general exposure in different time buckets. 3 Swap rate dynamics in the Libor model The Libor model is deliberately constructed in such a way that the forward Libor rates will be lognormal under certain probability measures – called forward measures – induced by using zero coupon bond prices as the numeraire. Similarly 2 For other methods see Dudenhausen et al. (1998) or Dun et al. (1999).

8. Towards a Central Interest Rate Model

285

the lognormal swap rate model chooses a specific numeraire so that under the measure it induces the forward swap rates will be lognormal. While this numeraire is quite valid within the Libor model framework, analytic tractability can only be obtained if we know the swap rate dynamics under one of the forward measures. Hence the aim of this section is to investigate the possibility of the swap rate being approximately lognormal under a certain forward measure – in this case the one corresponding to the maturity of the swaption PT0 – and to find an expression for its corresponding volatility.

3.1 Swap rate measure in the Libor model

The swap rate measure is the one induced by taking the PVBP = nj=1 δ P(t, T j ) as the numeraire. Under this measure the swap rate ω(t, T0 , n) will be a martingale. T0 (t) Denoting this measure, and the Brownian motion under it, as PT0 and W respectively, we can demonstrate the relationship between PT0 and the Libor model maturity forward measure PT0 as follows. Taking an arbitrary zero coupon bond P (t, Tk ) and applying Itˆo’s lemma to the quotient of it and the PVBP, we obtain $ # $ # FT0 (t, Tk ) P(t, Tk ) = d d δ nj=1 P(t, T j ) δ nj=1 FT0 (t, T j ) # n $ FT0 (t, Tk ) j=1 FT0 (t, T j )σ (t, j) n − σ (t, k) = n δ j=1 FT0 (t, T j ) j=1 FT0 (t, T j ) # $ n j=1 FT0 (t, T j )σ (t, j) n × dWT0 (t) + dt , (17) j=1 FT0 (t, T j ) where we define σ (t, n) as the stochastic function σ (t, n) =

n i=1

δ K (t, Ti−1 ) γ (t, Ti−1 ). 1 + δ K (t, Ti−1 )

The expression (17) is a martingale under PT0 , which implies n FT0 (t, T j )σ (t, j) T0 (t) = dWT0 (t) + j=1 n dW dt, j=1 FT0 (t, T j )

(18)

giving us an explicit relation between Brownian motion under the swap rate measure PT0 and the swaption maturity forward measure PT0 . Further, by applying (2) recursively we arrive at T0 (t) = dW

n

FT0 (t, T j ) dWT j (t) n , j=1 FT0 (t, T j )

j=1

(19)

286

A. Brace, T. Dun and G. Barton

implying not only that PT0 is an equivalent measure to the forward measures PT j , T under this measure is in fact a weighted average of but the Brownian motion W 0 the WT j . Given this relationship, and recalling that the swap rate will be a martingale under PT0 , we feel justified in looking for a lognormal approximation to the swap rate ω(t, T0 , n) under any other of the PT j , and in particular PT0 . Effectively we are choosing to neglect the drift term in (18), an assertion that we will verify by simulation in Section 5.1. Our next step is, assuming an approximate lognormal swap rate distribution under PT0 , to derive an expression for its volatility.

3.2 Approximate swap rate volatility As the swap rate definition (4) is effectively a weighted (by forward prices n FT0 (t, Ti )) average of Libor rates K (t, T j ), it seems evident that FT0 (t, T j )/ i=1 the contribution to the swap rate volatility by the K (t, T j ) will be significantly greater than that of the FT0 (t, T j ). In fact, in this analysis and much of that which follows, we will assume that the contribution in terms of volatility of the FT0 (t, T j ) is negligible and regard them (and hence also the P(t, T j )) as essentially constant at their initial values. This assumption is tested and justified by simulation means in Section 5.2. Examining the individual terms which make up the swap rate (4), we see that they are martingales under the T0 -forward measure PT0 , as demonstrated by Equations (20) and (21) below. d FT0 (t, T j ) FT0 (t, T j ) d FT0 (t, T j ) K (t, T j−1 ) FT0 (t, T j ) K (t, T j−1 )

= −σ (t, j) · dWT0 (t) =

γ (t, T j−1 ) − σ (t, j) · dWT0 (t).

(20) (21)

These terms will become lognormal if the stochastic term σ (t, j) is approximated deterministically. In this case, both the numerator and denominator of (4) will be sums of lognormal processes, and these sums will also be approximately lognormal, as in the standard approximations used to price average rate options. Hence, the swap rate ω (t, T, n), being the ratio of approximate lognormal processes under PT0 , ought to be approximately lognormal itself (with a drift) under the same measure. Following this reasoning, we model the swap rate dynamics under PT0 as dω (t, T, n) = ω (t, T, n) µ(t, T0 , n)dt + γ (t, T0 , n) · dWT0 (t)

(22)

and, neglecting the volatility contribution of the FT0 (t, T j ) as suggested above, we obtain the following approximate expression for the swap rate volatility γ (t, T0 , n)

8. Towards a Central Interest Rate Model

287

in terms of the Libor rate volatilities γ (t, T j ), n γ (t, T0 , n) =

n =

P(0, T j ) K (0, T j−1 ) γ (t, T j−1 ) n j=1 P(0, T j ) K (0, T j−1 )

j=1

(23)

FT0 (0, T j ) K (0, T j−1 ) γ (t, T j−1 ) n . j=1 FT0 (0, T j ) K (0, T j−1 )

j=1

The ability of this equation to predict Libor model swaption volatilities and prices for a given yield curve and Libor volatility function γ (t, T ) will be tested in Section 5.3

4 Greeks in the Libor model Another mechanism for assessing the closeness of swaption pricing within the Libor model to the Black swaption formula is through the calculation of the swaption Greeks. In this section we use approximation techniques to derive equations for the swaption delta, gamma and vega under arbitrary volatility specifications. As seen in Section 2.2, the definition and computation of the swaption delta, gamma and vega are straightforward in the framework implied by the Black swaption formula. Here, the swap rate is a real variable with respect to which we can differentiate, and its corresponding volatility can be expressed likewise – even if the model is multi-factor. For the Libor model, however, the swap rate is not a single quantity but a forward price-weighted sum of Libor rates – all of which can, to a certain extent, behave independently. This means that we do not have a real central variable with respect to which we can differentiate in order to define and compute swaption Greeks. The Libor rates are, however, related together by the swaption covariance matrix (defined in Section 2.1) and this matrix is often of rank one for both single and multi-factor volatility structures. This effectively implies that the Libor rates can, in fact, be described by a single variable. Taking this idea further, it implies – given the assumption of a rank one covariance matrix – the existence of a variable with which we can differentiate and define Greeks in the Libor model. This notion will be central to our approximation calculations below. Note that all the equations derived in this section will be examined numerically in Section 5. 3 Note than an equivalent expression to (23) is independently derived by Rebonato (1999) who also employs

simulation techniques to verify his results.

288

A. Brace, T. Dun and G. Barton

4.1 Approximations Here we give a formal list and explanation of the approximations and assumptions required to derive the equations for the swaption Greeks within the Libor model. Labelling them A1 to A4, we have: A1. The discount terms (FT0 (t, T j ), P(t, T j )) are constant at their initial time zero values; A2. The swaption covariance matrix is of rank one; A3. The volatility function is one-factor separable; and A4. The forward probability measures can be merged into one single measure. Approximation A1 was previously introduced in Section 3.2 where it was observed that the contribution of the volatility of the forward prices (and hence the zero coupon bonds) is essentially negligible. Assumption A2 is required in order to interrelate the Libor rates, and is, in fact, equivalent to A3, which is only included as a separate assumption for reasons of clarity. A3 assumes that we can approximate our (in general multi-factor) volatility function γ (t, T ) by a single-factor separable model, as in γ appr ox (t, T ) = ψ(t) φ(T ).

(24)

While this assumption seems quite restrictive, we note (see Appendix B) that it is entirely equivalent to Assumption A2, in that the volatility structure is separable if and only if the swaption covariance matrix is of rank one. Numerical results suggest that for most (non-extreme) volatility structures, the swaption covariance matrix is very close to rank one, validating both assumptions A2 and A3. This is considered in more detail in Section 5.3. The approximation (24) is constructed in such a way that it returns the rank one swaption covariance matrix T0 (λi, j ) = γ (s, Ti−1 ) · γ (s, T j−1 ) ds t T0 2 = φ(Ti−1 )φ T j−1 ψ (s) ds = × T , t

implying

j = φ T j−1

.

T0

ψ 2 (s)ds.

(25)

t

Approximation A4 is used in simplifying the relationship between the Libor rates and in the computation of the swaption gamma and vega. Essentially it is analogous to the implicit assumption in the Black swaption formula (mentioned in Section 2.2) that the swap rates are assumed lognormal under a single measure Pm .

8. Towards a Central Interest Rate Model

289

We assume that calendar time t = 0 and introduce the abbreviated notation K j ∼ K (0, T j ), P j ∼ P(0, T j ), and φ j ∼ φ(T j ), and the variable U satisfying dU = ψ(t) dW (t), where W (t) is Brownian motion under the single measure into which all the forward measures have been merged. Applying assumptions A1, A3 and A4 to Equations (1) and (4), we have the following simplified equations for the Libor and swap rate processes d K (t, T j−1 ) = K (t, T j−1 ) ψ(t) φ j−1 dWT j (t) = K (t, T j−1 ) φ j−1 dU, and

dω =

j

P j K j−1 φ j−1 dU. j Pj

(26)

(27)

With these assumptions/approximations, we can now proceed to derive equations for the swaption Greeks in the Libor model.

4.2 Libor model delta In the case of single-factor volatility functions, a swaption delta can be derived with minimal approximation by eliminating stochastic terms in the stochastic differential equations for the swap and swaption. Here we consider a different method involving differentiation inside the expectation term, a method which will be further utilised in Section 4.3 to derive an expression for the swaption gamma. Note however that both methods would produce an equivalent expression for the swaption delta. Define i−1 to be the partial derivative of the swaption price with respect to the Libor rate K (0, Ti−1 ). Denoting the swaption price Pswpn(0) as S, we have, using (5), # $ ∂S ∂ δ = P j ET j K (T, T j−1 ) − κ I(A) i−1 = ∂ K i−1 ∂ K i−1 j

∂I (A) ∂ K (T, T j−1 ) = δ P j ET j I (A) + K (T, T j−1 ) − κ . ∂ K i−1 ∂ K i−1 j By measure transformation, the second term inside the expectation can be shown to equate to

∂I (A) =0 P (0, T ) ET Swap(T ) ∂ K i−1

290

A. Brace, T. Dun and G. Barton

since ∂I (A) =0 ∂ K (0, Ti−1 )

if Swap(T ) = 0.

Using the integrated version of Equation (1), we can then show that the remaining expression reduces to i−1 = δ Pi N (h i )

(28)

where the h i are given by (9). Treating U as a real variable, we now obtain an expression for the swaption delta in the Libor model using the definition (13) from Section 2.2, # $ ∂S S 1 ∂ (29) = = ∂ω δ j P j δ j P j ∂ω ∂ S ∂ K j−1 ∂U 1 = δ j P j j ∂ K j−1 ∂U ∂ω j−1 K j−1 φ j−1 i Pi 1 = δ j Pj j i Pi K i−1 φ i−1 j P j N(h j )K j−1 φ j−1 = . (30) j P j K j−1 φ j−1 Equation (30) is tested against the Black swaption in Section 5.6, and in terms of swaption hedging in Section 5.8.

4.3 Libor model gamma Building on the approach of Section 4.2, we can now derive an expression for the swaption gamma in the Libor model. The first step is to calculate second derivatives of the Libor model swaption with respect to the K (·) – which we will denote as i,k – and then, using the assumptions of Section 4.1, obtain a single number that can be compared to the gamma given by the Black formula. We have4 ∂ 2 Pswpn(0) ∂ K i−1 ∂ K k−1

∂ K (T, Ti−1 ) ∂I (Swap(T )) = δ Pi ETi ∂ K i−1 ∂ K k−1

i−1,k−1 =

+ 4 Use the formulae d(x) = I(x), dI(x) = δ {x}, where I (·) is the Heaviside function and δ {·} is the Dirac

delta function.

dx

dx

8. Towards a Central Interest Rate Model

291

∂ K (T, Ti−1 ) ∂ K (T, Tk−1 ) = δ 2 Pi ETi P (T, Tk ) ∂ K i−1 ∂ K k−1 " 66 n ×δ δ P(T, T j ) K (T, T j−1 ) − κ . j=1

With assumption A4, and setting Z ∼ N (0, 1), it follows that,5

2 i−1,k−1 < δ Pi Pk E e(i Z ) e(k Z )δ δ P(T, T j ) K j−1 e j Z − κ j

= δ Pi Pk exp (i k )

P(T, T j ) K j−1 e j [Z + i + k ] − κ . ×E δ δ 2

j

Assuming that the ‘s’ satisfying (7) also approximately satisfies P(T, T j ) K j−1 exp j s − 12 2j − κ = 0,

(31)

j

then we have i−1,k−1 < =

δ Pi Pk exp (i k ) N (s − i − k ) 1 2 j P j K j−1 j exp j s − 2 j

δ Pi Pk N (s − i ) N (s − k ) . j P j K j−1 j N (s − j )

(32)

Using our definition for the swaption gamma (14), we can derive an expression in terms of the partial derivatives derived above, giving # $ ∂2 S 1 ∂ ∂S = = 2 ∂ω δ j P j δ j P j ∂ω ∂ω ∂ ∂ S ∂ K j−1 ∂U 1 . (33) = δ j P j j ∂ K j−1 ∂ω ∂U ∂ω Recall from Section 4.2 that we have ∂ K j−1 ∂U ∂U ∂ω ∂S ∂ω

Pi K j−1 j i Pi K i−1 i ∂ S ∂ K j−1 ∂U = ∂ K j−1 ∂U ∂ω j Pi = i j−1 K j−1 j , i Pi K i−1 i j =

i

5 If X is a random variable under some given measure, then e(X ) = exp X − 1 Var X . 2

292

A. Brace, T. Dun and G. Barton

and substituting these into (33) and taking the partial derivative gives us =

δ

j

Pj

2

i j K i−1 K j−1 i−1, j−1

P j K j−1 j # $# $ j Pj 2 2 + P j K j−1 j j−1 K j−1 j 2 j j δ j P j K j−1 j # $# $ 2 − j−1 K j−1 j P j K j−1 j i

j

j

j

j

in which the second term can be shown to be the difference of two quantities of similar order of magnitude and is hence taken to be zero. Substitution of (32) and collecting terms gives us our final expression for the Libor model swaption gamma =

j

Pj

j

P j K j−1 j N (s − j ) . 2 P K j j−1 j j

(34)

4.4 Libor model vega Finally, we wish to derive an equation for the swaption vega in the Libor model. Combining the approximate swap rate volatility equation (23) with Assumption A3 of an instantaneous one-factor separable volatility (24), we obtain j P j K j−1 φ j−1 γ (t, T0 , n) = ψ(t) . j P j K j−1 The swaption zeta in the Libor model corresponding to (12) is T0 |γ (s, T0 , n)|2 ds ζ = 0

T0

= 0

$2 # j P j K j−1 φ j−1 ψ (s)ds , j P j K j−1 2

and following the methodology presented in Section 2.2 we want to partially differentiate with respect

T0 to2 this variable to obtain the vega. To do this, we will denote by V the integral 0 ψ (s) ds and assume that this constitutes the variable part of ζ , implying $2 # P K φ ∂ζ j j−1 j−1 j . (35) = ∂V j P j K j−1

8. Towards a Central Interest Rate Model

293

From the definition of the vega (15), we have # $ ∂S ∂ Pswpn(0) 1 = = ∂ζ δ j Pj δ j P j ∂ζ =

δ

∂S ∂V , P j ∂ V ∂ζ

1 j

where, in this case, we can obtain the partial derivative ∂ S/∂ V by direct differentiation of the swaption formula (8). Using the additional assumption (implicit in the use of (31)) that d j ≈ 0, gives us ∂h j ∂(h j − j ) ∂S = δ N (h j ) − κ N (h j − j ) P j K j−1 ∂V ∂V ∂V j ∂s ∂ j ∂s + N (−s + j ) + κ N (s) = δ P j K j−1 − ∂V ∂V ∂V j ∂s P j K j−1 exp(s j − 12 2j ) − κ N (s) = δ − ∂V j ∂ j P j K j−1 +δ N (s − j ), ∂V j where the first term can be seen to satisfy (31) and so can be taken as zero. Partial differentiation of (25) yields φ j−1 ∂ j j = √ = ∂V 2V 2 V and hence δ ∂S = P j K j−1 j N (s − j ). ∂V 2V j Substituting from above as necessary, the vega is therefore = = =

δ

1 j

∂S ∂V P j ∂ V ∂ζ #

1

P j K j−1

$2

P j K j−1 j N (s − j ) P P K φ j j j−1 j−1 j j j # $2 1 j P j K j−1 P j K j−1 j N (s − j ). 2 j Pj P K j j−1 j j j 2V

j

(36)

294

A. Brace, T. Dun and G. Barton

Noting from (4) that ω = j P j K j−1 / j P j , we see that the gamma and vega equations (34) and (36) satisfy the constraint (16) imposed on them in Section 2.2, 1 = 2

#

j P j K j−1 j Pj

$2

1 = ω2 . 2

5 Numerical testing and results Ultimately, the closeness of swaption pricing within the Libor model to the Black swaption formula must be tested numerically. In this section, the assumptions fundamental to the analysis are verified, the regime used to test the equations is explained, and the results of the numerical testing presented. In order to test the approximate equations for volatility, pricing and Greeks thoroughly, a range of swaptions, strike values, yield curves and volatility specifications is required. In this light, it was decided to test a matrix consisting of 15 swaptions with maturity values ranging from 0.5 to 4 years, lengths of 1 to 8 years, and at strike values in-, at- and out-of-the-money. The tests were conducted for two separate volatility specifications – the first a single-factor homogeneous parameterisation to actual historic data, chosen to reflect typical market conditions – and the second, an artificial two-factor volatility function chosen to mimic a pathological market situation and stress test the results. Further details on the volatility specifications and their associated yield curves are given in Appendix C. With the Black pricing formula, the price and Greeks can all be computed upon specification of the Black volatility σ . This is not the case in the Libor model, where an equivalent Black volatility can be obtained only by first computing the price and then ‘backing out’ the volatility by solving Equation (10) for a constant valued volatility function σ . Given that any comparison between prices and Greeks would be meaningless if not computed at a Black volatility equivalent to both frameworks, we define the Libor model true price as that value obtained from simulation, and the true volatility as the value obtained by backing out the true price at-the-money. The necessity of this distinction becomes apparent when one notes that Libor swaption pricing formula (8) only gives an approximate price, and one that can deviate from the true value under certain circumstances. The simulated price, however, is a reflection of the exact price, and, exploiting variance reduction means, can be made as accurate as required. This provides us with a number, free of approximation, which can be used objectively for comparison purposes. We start, however, by verifying the assumptions used in deriving the various approximations.

8. Towards a Central Interest Rate Model

295

Fig. 1. Normal probability plot of the log of the swap rates simulated under the Libor model for a 1/8 swap using the second volatility structure.

5.1 Lognormality of the swap rates In Section 3.1 it was postulated that the swap rate ω could be modelled as being approximately lognormal under the PT0 forward measure. This was tested numerically by simulating swap rates under the appropriate measure within the Libor model framework. The simulation was performed by discretising the stochastic differential equations for the Libor rates (1) to produce sets of future yield curves from which the swap rates could be extracted.6 Statistical tests were then applied to the swap rates to determine the nature of the resulting distributions. Figure 1 is an example of one of those statistical tests; a normal probability plot of the log of the simulated swap rates, in this case for an eight year swap, maturing in one year, simulated using the pathological volatility structure. A normal probability plot allows one to determine if random observations come from a normally distributed population; a straight line indicating the affirmative. Slight deviations at either end of the line are common, as a finite number of samples will never be able to fit the infinite tails of the normal distribution exactly. The test can be formalised through the use of quantitative statistical tests (such as the Shapiro–Wilk test), or a goodness-of-fit test between the expected and observed sample frequencies. The latter was used in this case. All the swaptions for both volatility structures gave similar results to those in Figure 1, and at a 95% confidence level, were shown to follow a lognormal probability distribution. 6 See Brace (1998) for details of the simulation routine used, and Glasserman et al. (2000) for detailed analysis

of a range of simulation methods in the forward Libor model.

296

A. Brace, T. Dun and G. Barton

Fig. 2. The ratio between simulated swap rates with and without the effect of the zero coupon bonds.

5.2 Swap rate approximation The approximations in Sections 3–4 rely on the assumption that the contribution of the volatility of the discount terms (forward prices and zero coupon bonds) towards the overall volatility of the swap rate is negligible, and that the discount terms can be considered constant at their initial values. Figure 2 confirms the validity of this assumption on the swap rate for a 1/5 swap, simulated using the second volatility structure. It shows the ratio of the simulated swap rate calculated using all the discount terms, to the value obtained by taking these terms as constant. A value of 1 indicates that the calculation methods are equivalent. This figure demonstrates that the assumption is quite reasonable, leading to errors in the swap rate that are generally below one per cent.

5.3 Rank one covariance matrix The Libor model swaption formula (8) and all the analysis in Section 4 are fundamentally dependent on the assumption that the swaption covariance matrix λ is of rank one. A symmetric matrix is of rank one when it has only one non-zero eigenvalue. A rank one approximation to an arbitrary symmetric matrix will only be accurate if the ratio of the second largest to the largest eigenvalue is small.

8. Towards a Central Interest Rate Model

297

Table 1. Ratio of the first and second eigenvalues for the swaption covariance matrices (both volatility structures). Volatility structure 1

2

Swaption maturity

Swaption length

0.25

1

2

4

1 2 4 8

0.0% 0.0% 0.0% 0.0%

7.5% 1.5% 2.1% 1.6%

1.5% 3.2% 3.5% 2.7%

2.1% 3.5% 5.9%

1 2 4 8

0.5% 4.4% 30.8% 20.5%

1.0% 6.7% 27.9% 13.0%

1.6% 8.2% 17.3% 7.9%

1.6% 4.8% 6.4%

In the case of the Libor model, the rank of the swaption covariance matrix will depend on the form of the volatility function γ (t, T ), and the maturity and length of the individual swaption. A swaption is said to be exhibiting rank two behaviour when the rank one price (8) begins to deviate from the true price. This seems to occur for an eigenvalue ratio of 5% or above, with 20–30% representing extreme values. Table 1 shows this ratio for all the swaptions and volatility structures considered in this paper. A value of 0 represents a swaption covariance matrix of rank one. The second volatility structure was chosen for its pathological nature, and this is reflected in the more extreme values for the eigenvalue ratio seen here. It would not be surprising, therefore, if the approximations of Section 4 were to break down for some of the swaptions under the second volatility structure.

5.4 Swap rate volatility In Section 3.2, we derived the approximate equation (23) for the equivalent Black volatility of a Libor model swaption. In Table 2 we compare values given by this equation to the true volatility, defined in Section 5 as the volatility implied by the at-the-money simulation price of the corresponding swaption within the Libor model framework. The results indicate that the volatility approximation is quite accurate, with all the values for rank one swaptions within about 12 basis points, with this figure rising to 80 basis points for the more extreme rank two swaptions occurring under the second volatility structure. In general, however, the approximate volatility equation (23) provides a good indication for the Libor model true volatility.

298

A. Brace, T. Dun and G. Barton

Table 2. Black volatility verification results for both volatility structures. Volatility structure

Swaption maturity

Swaption length

Volatility description

0.25

1

2

4

1

true approximation

4.64% 4.65%

5.73% 5.74%

10.14% 10.15%

17.59% 17.61%

2

true approximation true approximation true approximation

6.97% 6.98% 14.02% 14.07% 15.32% 15.44%

9.37% 9.38% 15.53% 15.57% 15.80% 15.90%

14.23% 14.24% 17.56% 17.59% 16.57% 16.65%

18.58% 18.58% 18.51% 18.56%

true approximation true approximation

23.16% 23.20% 18.60% 18.72%

19.81% 19.85% 16.64% 16.74%

17.46% 17.50% 16.26% 16.17%

17.76% 17.75% 18.06% 18.04%

true approximation true approximation

15.79% 15.85% 18.37% 17.88%

15.81% 15.68% 19.05% 18.35%

16.67% 16.41% 20.34% 19.54%

20.24% 20.13%

1 4 8 1 2 2 4 8

5.5 Swaption prices Table 3 compares swaption prices for the first volatility structure. Three different prices are given – the true value obtained by simulation, an approximate value obtained by using the Black swaption formula (10) with the swap rate volatility approximation (23), and the Libor model rank one price (8). The prices are expressed in basis points (bp), where 1 bp = $100 per $1M face value. As with the previous swaption volatilities, for the rank one swaptions, the volatility approximation provides a reasonable estimate of the swaption price. As to be expected, the Libor model price performs better in most situations. The deviation between the true and rank one prices is evident in the rank two swaptions under the second volatility structure (shown in Appendix A), and it is not surprising to note that under these circumstances the volatility approximation mirrors the rank one price more than the true price. In general, however, these results show that a Libor model swaption behaves very much like a Black swaption with the volatility given by Equation (23).

8. Towards a Central Interest Rate Model

299

Table 3. Swaption price comparisons for the first volatility structure (all values expressed in basis points). Swaption length

0.25

1

2

4

true vol approx rank 1

12.52 12.53 12.52

30.34 30.35 30.32

68.87 68.96 68.85

126.86 126.93 126.91

true vol approx rank 1 true vol approx rank 1

6.18 6.18 6.18 2.29 2.29 2.29

15.37 15.37 15.35 5.59 5.59 5.58

37.22 37.25 37.20 13.06 13.00 13.05

78.97 79.04 79.02 25.16 25.18 25.17

true vol approx rank 1 true vol approx rank 1 true vol approx rank 1

37.29 37.34 37.29 18.56 18.59 18.56 6.83 6.83 6.83

94.79 95.08 94.77 49.42 49.51 49.40 17.86 17.68 17.85

178.16 178.61 178.07 100.45 100.54 100.36 34.55 34.17 34.54

254.77 254.79 254.72 160.62 160.65 160.56 50.74 50.77 50.69

true vol approx rank 1 true vol approx rank 1 true vol approx rank 1

140.40 140.93 140.38 71.82 72.09 71.81 26.16 26.04 26.18

282.57 283.66 282.38 154.19 154.57 154.02 54.22 53.63 54.24

397.87 398.71 397.51 231.58 231.90 231.16 77.45 77.18 77.44

475.04 475.78 475.20 299.12 299.90 299.26 94.53 94.79 94.35

IN

true vol approx rank 1

272.66 273.80 272.47

507.00 509.09 506.52

666.23 668.80 665.36

AT

true vol approx rank 1

139.66 140.79 139.57

276.30 278.07 276.10

383.50 385.49 382.81

OUT

true vol approx rank 1

50.09 50.68 50.12

95.81 96.33 96.03

128.49 129.04 128.74

Strike IN

1

AT

OUT

IN

2

AT

OUT

IN

4

AT

OUT

8

Swaption maturity

Price description

300

A. Brace, T. Dun and G. Barton

Table 4. Delta comparisons for Libor and Black swaptions for the first volatility structure. Swaption length

1

Swaption maturity Strike

Model

0.25

1

2

4

IN

Black Libor

0.750 0.751

0.750 0.751

0.750 0.752

0.750 0.750

AT

Black Libor Black Libor

0.505 0.506 0.250 0.251

0.511 0.512 0.250 0.250

0.529 0.531 0.250 0.252

0.570 0.570 0.250 0.250

Black Libor Black Libor

0.750 0.752 0.507 0.508

0.750 0.755 0.519 0.523

0.750 0.755 0.540 0.545

0.750 0.750 0.574 0.574

OUT

Black Libor

0.250 0.251

0.250 0.255

0.250 0.255

0.250 0.250

IN

Black Libor Black Libor Black Libor

0.751 0.756 0.514 0.519 0.249 0.255

0.750 0.757 0.531 0.538 0.249 0.257

0.750 0.755 0.549 0.554 0.250 0.254

0.750 0.751 0.573 0.574 0.249 0.250

Black Libor Black Libor

0.752 0.755 0.515 0.518

0.751 0.756 0.531 0.536

0.751 0.754 0.547 0.550

Black Libor

0.248 0.251

0.248 0.253

0.248 0.252

OUT IN 2

4

AT

AT OUT IN

8

AT OUT

5.6 Swaption delta The validity of the approximate swaption delta equation is illustrated in Table 4 which compares values for a range of equivalently priced Black and Libor model swaptions at-, in- and out-of-the-money for the first volatility structure. The Black swaption delta is calculated using the true swap rate volatility (see Section 5.4), with the strike values chosen so that the values in- and out-of-the-money are approximately 0.75 and 0.25, respectively. The results show that the approximate method gives good agreement to the Black swaption – showing slight, yet consistent, over-estimation of the true values.

8. Towards a Central Interest Rate Model

301

Even for the more extreme swaptions under the second volatility structure (see Appendix A), the agreement is quite acceptable, with the values deviating by 4.5% at most, with the average deviation being 0.1%. Note, however, that this deviation, for both volatility structures, tends to increase slightly as the swaptions move outof-the-money.

5.7 Swaption gamma and vega Libor model gamma and vega equations (34) and (36) were tested against their Black counterparts (14) and (15), respectively, with the results shown in Table 5. As in Section 5.6, the Black swaption Greeks are calculated using the true volatility, and the same in- and out-of-the-money strike prices are used. Note that the results will be entirely analogous to the results, as is directly proportional to , as given by (16). We see in general for both and that the agreement between the swaption behaviours is not as good as for the , yet is still quite acceptable, with most of the Libor model results within 5% of the Black values. Note that the Libor model equations tend to underestimate the values in-the-money, while overestimating outof-the-money. Note also that the agreement between the values deteriorates with longer swaption maturity and length. This is also true for the second volatility structure, shown in Appendix A.

5.8 Swaption delta-hedging The Libor model equation (30) gives an approximation to the partial derivatives of the swaption price with respect to the swap rate. However, as explained in Section 2.3, in the Black–Scholes framework (or here, in the framework implied by the Black swaption formula) the is more than just a partial derivative – it represents a probability of exercise of the option – and is fundamental to the concept of hedging. It would be interesting to know if this concept can also be extended to the case of the approximate Libor model delta. To test this, yield curve movements were simulated in the Libor model framework and swaptions hedged using the methodology from Section 2.3 and the approximate formula (30). Rebalancing was effected at a frequency of five times per quarter, and, due to the lack of true (or simulation) prices and volatilities, the hedging was based on values given by the rank one Libor model price formula (8). For comparison purposes, the delta-hedge was run in conjunction with a Libor model hedge encompassing all the relevant Libor rates treated individually – as predicted from the partial derivatives with respect to the Libor rates given by (28).

302

A. Brace, T. Dun and G. Barton

Table 5. Gamma and vega comparisons for Libor model and Black swaptions (for the first volatility structure). Greek type

Swaption length

1

Swaption maturity Strike

Model

0.25

1

2

4

IN

Black Libor

193.5 192.8

73.7 73.4

28.1 27.9

11.3 11.1

AT

Black Libor Black Libor

243.0 242.8 193.5 194.1

92.5 92.5 73.7 73.9

35.3 35.2 28.1 28.4

14.0 13.9 11.3 11.4

Black Libor Black Libor Black Libor

124.3 123.4 156.1 155.8 124.2 124.7

44.1 43.3 55.3 55.1 44.1 44.7

20.1 19.6 25.1 24.9 20.1 20.5

10.7 10.4 13.2 13.1 10.7 10.9

Black Libor Black Libor Black Libor

59.6 58.2 74.9 74.5 59.6 60.4

26.2 25.3 32.8 32.6 26.2 27.0

16.1 15.5 20.1 19.9 16.1 16.7

10.6 10.1 13.1 12.9 10.6 11.0

Black Libor Black Libor Black Libor

52.9 51.3 66.6 65.9 52.9 53.6

25.2 24.0 31.6 31.2 25.2 26.1

16.8 15.7 21.0 20.6 16.8 17.5

OUT IN 2

AT OUT

Gamma IN 4

AT OUT IN

8

AT OUT IN

1

AT OUT

Vega IN 2

AT OUT

Black Libor Black Libor Black Libor

0.484 0.482 0.607 0.607 0.484 0.485

0.208 0.208 0.262 0.262 0.208 0.209

0.087 0.086 0.109 0.109 0.087 0.088

0.036 0.036 0.045 0.044 0.036 0.037

Black Libor Black Libor

0.334 0.332 0.420 0.419

0.130 0.128 0.164 0.163

0.062 0.061 0.078 0.077

0.034 0.033 0.042 0.042

Black Libor

0.334 0.336

0.130 0.132

0.062 0.063

0.034 0.035

8. Towards a Central Interest Rate Model

303

Table 5. (cont.) Greek type

Swaption length

4

Swaption maturity Strike

Model

0.25

1

2

4

IN

Black Libor

0.172 0.168

0.080 0.077

0.051 0.049

0.035 0.033

AT

Black Libor Black Libor

0.216 0.215 0.172 0.174

0.100 0.099 0.080 0.082

0.063 0.063 0.051 0.052

0.043 0.042 0.035 0.036

Black Libor Black Libor Black Libor

0.162 0.156 0.203 0.201 0.161 0.164

0.080 0.076 0.100 0.099 0.080 0.082

0.055 0.051 0.068 0.067 0.055 0.057

OUT Vega IN 8

AT OUT

A more detailed explanation of the mathematics and methodology of the hedging simulation is beyond the scope of this chapter and can be found in Dun et al. (1999). Table 6 presents the results of these hedging tests in the form of means and standard deviations of the hedging profit and loss (P/L) for both volatility structures. A zero mean P/L with a small standard deviation is clearly the preferred outcome in any hedging exercise. The results show that the approximate Libor performs equally as well as individual hedges into the Libor rates – both in terms of P/L mean and standard deviation. All the rank one swaptions have been successfully hedged, with average P/Ls close to zero, while the rank two swaptions show some bias. This bias seems to be approximately equal to the difference between the true and rank one prices, and could probably be reduced by using the true volatility as the basis for the hedges rather than a rank one volatility as mentioned above. In general, however, the results imply that the approximate Libor model is useful for hedging, and that the intuition attached to the delta value in Black swaptions is also valid in the Libor model framework. 6 Conclusions In conclusion, we have derived approximate equations within the lognormal forward Libor model which indicate that swaption pricing in this framework is quite close to market practice. A simple equation can be used to estimate the Black volatility of Libor model swaptions, which can then be priced using the Black

Table 6. Simulated delta hedging means (and standard deviations) for both volatility structures. Values expressed in basis points. Volatility structure

1

Hedging method

1

Approx delta Libor rates Approx delta Libor rates Approx delta Libor rates

0.0 0.0 0.0 0.0 0.0 0.0

(2.3) (2.3) (6.7) (6.7) (26.7) (26.7)

0.0 0.0 0.1 0.1 0.3 0.3

(3.0) 0.0 (3.0) 0.0 (9.6) 0.0 (9.6) 0.0 (28.8) −0.3 (28.8) −0.3

(6.1) 0.1 (6.1) 0.1 (14.5) −0.1 (14.5) −0.1 (30.8) 0.0 (30.8) 0.0

8

Approx delta Libor rates

0.4 0.4

(50.2) −0.6 (50.2) −0.6

(52.6) −0.8 (52.5) −0.8

(51.3) (51.2)

1

Approx delta Libor rates Approx delta Libor rates

0.0 0.0 0.0 0.0

(13.7) (13.7) (23.5) (23.4)

0.0 0.0 0.0 0.0

(14.4) (14.4) (23.9) (23.9)

0.0 0.0 0.2 0.2

(14.4) 0.0 (14.4) 0.0 (21.1) −0.2 (21.0) −0.2

(8.0) (8.0) (14.4) (14.4)

Approx delta 0.1 Libor rates 0.1 Approx delta −9.8 Libor rates −9.9

(36.4) (36.4) (64.7) (64.7)

−1.4 −1.4 −15.9 −15.9

(36.2) (36.1) (65.1) (65.3)

−4.6 −4.6 −14.0 −14.0

(33.5) −0.7 (33.4) −0.7 (60.6) (60.6)

(26.8) (26.8)

2 4

2

Swaption maturity

Swaption length

2 4 8

0.25

1

2

4 (8.4) (8.4) (16.2) (16.2) (28.4) (28.3)

8. Towards a Central Interest Rate Model

305

swaption formula. Equations for swaption Greeks in the Libor model were derived and shown to retain their Black swaption significance, while Libor model swaptions could be successfully hedged with the swaption delta derived. Estimates are accurate while the assumption of a rank one swaption covariance matrix holds, although even when violated, the estimates are still surprisingly close to the true values. Swaption maturity, length and strike value do exhibit a slight influence on the estimates. Overall, the results support the idea that the Libor model could be used for all swaption pricing – as well as caps and exotics pricing – since it can be calibrated to both caps and swaptions markets simultaneously. Conversely, the results could be used to support the idea in Jamshidian (1997) that models which are robust and adapted to the products being priced should be used – even if this means using mutually exclusive models – since we have shown that the Libor and Black (and hence by extension the swap rate) approaches are, numerically, not so different. This study still leaves some questions unanswered, providing scope for further work. This includes, for example, the derivation of analytic bounds for the approximations presented here, an analysis of the closeness of the models when pricing exotics, and an investigation into the impact of using the assumptions of Section 4.1 to simplify exotics pricing. Appendix A. Results for the second volatility structure Comparisons of prices, deltas, gammas and vegas for the second volatility structure not tabulated in the body of the paper appear in Tables 7–9. Appendix B. Rank one and separable volatility If the volatility function is separable, all swaption quadratic variation matrices are of rank one. On the other hand, if a swaption quadratic variation matrix is of rank one, for arbitrary T and Ti = T + iδ, we must have t 2 t t 2 2 γ (s, T ) ds γ (s, Ti ) ds = γ (s, T ) γ (s, Ti ) ds . 0

0

0

The following lemma shows that if this condition is strengthened, separability follows. Lemma 1 Let the LFM volatility function γ (·) be well behaved, and satisfy 2 t t t 2 2 γ (s, u) ds γ (s, v) ds = γ (s, u) γ (s, v) ds 0

0

for all relevant t, u, v. Then γ (·) is separable.

0

(37)

306

A. Brace, T. Dun and G. Barton

Table 7. Swaption price comparisons for the second volatility structure. Swaption length

1

Price Strike

description

0.25

1

2

4

IN

true vol approx rank 1

69.84 69.91 69.83

123.44 123.62 123.44

159.82 160.09 159.87

121.77 121.74 121.76

true vol approx rank 1 true vol approx rank 1

36.95 37.01 36.94 13.07 13.08 13.06

69.31 69.44 69.31 23.62 23.63 23.62

92.79 93.04 92.86 30.89 30.98 30.95

75.99 75.95 75.98 24.24 24.16 24.20

true vol approx rank 1 true vol approx rank 1 true vol approx rank 1

121.42 121.84 121.26 63.03 63.43 62.87 22.48 22.66 22.37

220.52 221.27 220.12 120.88 121.58 120.52 41.67 41.96 41.50

249.97 249.57 250.11 143.94 143.18 144.02 49.02 48.07 48.93

229.35 229.20 229.37 143.69 143.52 143.72 45.80 45.55 45.68

true vol approx rank 1 true vol approx rank 1 true vol approx rank 1

194.98 195.34 194.66 100.24 100.60 99.93 35.99 36.18 35.82

343.86 342.57 342.88 188.39 186.81 187.18 66.04 64.78 65.07

416.41 413.61 413.32 241.57 237.83 237.41 83.07 79.73 79.32

433.46 432.54 433.03 279.52 278.05 278.76 88.45 86.78 87.50

true vol approx rank 1 true vol approx rank 1

337.74 333.90 329.26 178.09 173.28 167.57

599.84 590.10 587.22 340.50 328.00 324.92

728.68 715.81 719.30 441.36 424.19 429.17

true vol approx rank 1

65.70 62.01 57.64

122.64 112.37 110.47

153.97 139.49 144.33

AT

OUT

IN

2

AT

OUT

IN

4

AT

OUT

IN

8

Swaption maturity

AT

OUT

8. Towards a Central Interest Rate Model

307

Table 8. Delta comparisons for Libor model and Black swaptions for the second volatility structure. Swaption

Swaption maturity

length

1

Strike

Model

0.25

1

2

4

IN

Black Libor

0.750 0.751

0.750 0.751

0.750 0.751

0.750 0.750

AT

Black Libor Black Libor

0.523 0.524 0.250 0.250

0.539 0.540 0.249 0.251

0.549 0.550 0.249 0.250

0.570 0.570 0.250 0.250

Black Libor Black Libor Black Libor

0.751 0.753 0.519 0.520 0.248 0.250

0.751 0.753 0.533 0.535 0.248 0.250

0.749 0.750 0.546 0.546 0.252 0.252

0.750 0.750 0.572 0.572 0.250 0.250

Black Libor Black Libor Black Libor

0.751 0.752 0.516 0.516 0.249 0.249

0.749 0.750 0.532 0.531 0.252 0.251

0.748 0.751 0.547 0.547 0.255 0.250

0.750 0.752 0.580 0.582 0.252 0.253

Black Libor Black Libor

0.745 0.759 0.518 0.521

0.744 0.757 0.538 0.543

0.745 0.755 0.557 0.563

Black Libor

0.257 0.245

0.260 0.254

0.262 0.262

OUT IN 2

AT OUT IN

4

AT OUT IN

8

AT OUT

Proof Set . a(t, u) =

t

γ 2 (s, u) ds,

0

a(t, ˙ u) = rewrite (37) as

∂a(t, u) , ∂t

t

γ (s, u) γ (s, v) ds = a(t, u)a (t, v), 0

308

A. Brace, T. Dun and G. Barton

Table 9. Gamma and vega comparisons for Libor model and Black swaptions for the second volatility structure. Greek type

Swaption length

1

Swaption maturity Strike

Model

0.25

1

2

4

IN

Black Libor

32.0 31.8

15.9 15.7

10.6 10.4

10.7 10.6

AT

Black Libor Black Libor

40.1 40.0 32.0 32.1

19.8 19.8 15.8 16.0

13.2 13.1 10.6 10.7

13.2 13.2 10.7 10.8

Black Libor Black Libor

35.7 35.1 44.9 44.6

17.2 16.8 21.6 21.4

13.0 12.8 16.2 16.2

10.9 10.7 13.4 13.4

OUT

Black Libor

35.7 35.8

17.2 17.4

13.0 13.3

10.9 11.1

IN

Black Libor Black Libor

40.7 40.0 51.1 50.9

20.2 20.0 25.2 25.5

14.3 14.2 17.7 18.2

10.4 10.0 12.8 12.7

OUT

Black Libor

40.6 41.0

20.2 20.8

14.4 15.1

10.4 11.0

IN

Black Libor Black Libor

39.4 41.0 48.9 53.4

19.3 19.6 23.9 25.7

13.7 13.5 16.8 17.6

OUT

Black Libor

39.6 43.1

19.5 21.8

13.9 15.6

IN

Black Libor Black Libor Black Libor

0.118 0.117 0.147 0.147 0.118 0.118

0.081 0.080 0.101 0.101 0.081 0.082

0.078 0.077 0.098 0.097 0.078 0.079

0.037 0.037 0.046 0.046 0.037 0.038

IN

Black Libor

0.163 0.160

0.106 0.103

0.074 0.073

0.036 0.035

AT

Black Libor

0.205 0.204

0.132 0.131

0.092 0.092

0.044 0.044

OUT

Black Libor

0.163 0.163

0.105 0.107

0.074 0.076

0.036 0.036

OUT IN 2

AT

Gamma

4

8

1

AT

AT

AT OUT

Vega

2

8. Towards a Central Interest Rate Model

309

Table 9. (cont.) Greek type

Swaption

Swaption maturity

length

Strike

Model

0.25

1

2

4

IN

Black Libor

0.199 0.196

0.101 0.100

0.064 0.064

0.030 0.029

AT

Black Libor Black Libor

0.250 0.249 0.199 0.200

0.126 0.127 0.101 0.104

0.080 0.082 0.065 0.068

0.036 0.036 0.030 0.031

Black Libor Black Libor Black Libor

0.155 0.161 0.192 0.210 0.155 0.169

0.074 0.075 0.091 0.098 0.074 0.083

0.046 0.045 0.056 0.059 0.046 0.052

4

OUT Vega IN 8

AT OUT

differentiate with respect to time t to get γ (t, u) γ (t, v) a(t, ˙ u) a˙ (t, v) + = , a(t, u) a (t, v) a(t, u) a (t, v) and then with respect to v to get < ∂ a˙ (t, v) a(t, u) ∂ γ (t, v) . = ∂v a (t, v) ∂v a (t, v) γ (t, u)

(38)

Since the left hand side of (38) is a function of only t and v, while the right hand side is a function of only t and u, both must be functions of just t. For some function b(t), we must therefore have

t

γ 2 (s, u) ds = b(t)γ 2 (t, u).

0

Differentiation with respect to t, rearrangement, and then integration with respect to t gives ∂γ 2 (t, u) ∂t

< γ 2 (t, u) = ln [γ (t, u)] =

= ˙ 1 − b(t) b(t), = 1 t ˙ 1 − b(s) b(s)ds + c(u), 2 0

310

A. Brace, T. Dun and G. Barton

Fig. 3. Graphical representation of the first volatility structure.

where c (·) is an arbitrary function of u. Setting t = 1 ˙ 1 − b(s) b(s) ds , ψ(t) = exp 2 0 φ(u) = exp (c(u)), gives γ (t, u) = ψ(t)φ(u), which is the result.

Appendix C. Yield curve and volatility structures C.1 Market fit volatility structure The first volatility structure (Figure 3) is a simple one-factor homogeneous parameterisation to market data – the first six months of 1997 UK market data being used here. The yield curve used (Figure 4) is a typical one for that period of time.

C.2 Pathological volatility structure The second volatility structure was chosen intentionally to be pathological, or representative of an extreme market situation. The functions were also optimised in order to ensure that some of the 15 swaptions to be tested had extreme rank two swaption covariance matrices.

8. Towards a Central Interest Rate Model

Fig. 4. Forward Libor rates used in conjunction with the first volatility structure.

Fig. 5. Yield curve associated with the second volatility structure.

The functional form chosen for the yield curve was Yield(T ) =

0.07 + 0.03T /3 for T < 3 0.10 − 0.02(T − 3)/7 otherwise

311

312

A. Brace, T. Dun and G. Barton

Fig. 6. Graphical representation of the two factors of the second volatility structure.

and is shown in Figure 5, while the equations for the volatility were 0.05(T − t) for (T − t) < 6 γ 1 (t, T ) = 0.3 otherwise γ 2 (t, T ) = 0.3 exp (−0.54(T − t)) and these are graphed in Figure 6.

References Brace, A. (1996), Dual swap and swaption formulae in the normal and lognormal models. University of New South Wales Preprint. Brace, A. (1998), Simulation in the GHJM and LFM models. FMMA notes. Brace, A., Gatarek, D. and Musiela, M. (1997), The market model of interest rate dynamics. Math. Finance 7, 127–54. Dudenhausen, A., Schl¨ogl, E. and Schl¨ogl, L. (1998), Robustness of Gaussian hedges under parameter and model misspecification. Working paper, University of Bonn. Dun, T., E., Schl¨ogl and Barton, G. (1999), Simulated swaption delta-hedging in the lognormal forward LIBOR model. Forthcoming in the International Journal of Theoretical and Applied Finance 4(1) 2001. Glasserman, P. and Zhao, X. (2000), Arbitrage-free discretization of lognormal forward LIBOR and swap rate models. Finance Stochast 4(1), 35–68. Hunt, P., Kennedy, J. and Pelsser, A. (1997), Markov functional interest rate models. ABN Amro preprint. Jamshidian, F. (1997), Libor and swap market models and measures. Finance Stochast. 1, 293–330. Miltersen, K.,Sandmann, K. and Sondermann, D. (1997), Closed form solutions for term structure derivatives with lognormal interest rates. J. Finance 52, 407–30.

8. Towards a Central Interest Rate Model

313

Musiela, M. and Rutkowski, M. (1997a) Martingale Methods in Financial Modelling. Springer-Verlag, Berlin. Musiela, M., Rutkowski, M. (1997b) Continuous-time term structure models: a forward measure approach. Finance Stochast. 1, 261–91. Plackett, R.L. (1954), A reduction formula for normal multivariate integrals Biometrika 41, 351–60. Rebonato, R. (1999), On the pricing implications of the joint log-normal assumptions for the swaption and cap markets. Journal of Computational Finance 2(3), 57–76.

9 Infinite Dimensional Diffusions, Kolmogorov Equations and Interest Rate Models B. Goldys and M. Musiela

1 Introduction The common feature of interest rate models is, that taking the Heath, Jarrow and Morton model Heath et al. (1992) as a starting point they naturally lead to infinite dimensional Markov processes which describe the arbitrage free dynamics of forward rates. By a forward rate r (t, x) we mean the continuously compounded forward rate prevailing at time t over the time interval [t + x, t + x + d x]. Usually, the time evolution of forward curves r (t, ·) is completely determined by the initial curve and the volatility structure. The question how to determine the volatility structure is a delicate one and different approaches can be chosen to address this problem; for possible answers see Musiela (1993), Brace and Musiela (1994), Goldys et al. (1995) or Brace et al. (1997). In this chapter, however, we assume that the volatility structure {σ (t, x) : t ≥ 0, x ≥ 0} is a known vector-valued stochastic process. In that case the forward rate process {r (t, x) : t ≥ 0, x ≥ 0} must satisfy the following stochastic partial differential equation 1 ∂ 2 r (t, x) + |σ (t, x)| dt + σ (t, x)dW (t) dr (t, x) = (1.1) ∂x 2 for all t, x ≥ 0, where W is a d-dimensional Brownian motion. It has been shown in Musiela (1993) that (1.1) is sufficient for the nonarbitrage condition. We will concentrate on two models: • Gaussian r (t, x) model for its theoretical and computational simplicity, BGM model. We start with the derivation of the stochastic PDE which is satisfied by the forward rate process {r (t, x) : t, x ≥ 0} We model the uncertainty of future interest rate movements using an infinite family of Wiener processes {Wk : k ≥ 1} defined on the common stochastic basis (, F, (Ft ), P). We assume that (Ft ) is a P-augmentation of the natural filtration σ (Wk (s) : s ≤ t, k ≥ 1). Let 314

9. Kolmogorov Equations and Interest Rate Models

315

{X (t, x} : t, x ≥ 0} be an arbitrary random field. We say that X is adapted to the filtration (Ft ) if σ (X (s, x) : s ≤ t, x ≥ 0) ⊂ Ft for every t ≥ 0. Let P(t, T ) denote the price at time t ≥ 0 of a zero coupon bond with maturity T ≥ t. We assume that T −t P(t, T ) = exp − r (t, u)du (1.2) 0

for a certain measurable random field {r (t, x) : t, x ≥ 0} which is locally bounded: for every T > 0 sup |r (t, x)| < ∞,

P-a.s.

(1.3)

t,x≤T

It follows that the process of saving account t r (u, 0)du , β(t) = exp

t ≥ 0,

0

is well defined. The discounted price of the zero coupon is defined as N (t, T ) =

P(t, T ) , β(t)

t ≤ T.

(1.4)

Theorem 1.1 Let (1.3) hold and let the random field r be adapted to (Ft ). Assume that for every T > 0 the process {N (t, T ) : t ≤ T } is a (P, (Ft ))-martingale and, moreover, R .log N (·, T )/t dT < ∞, R > 0. E (1.5) 0

Then there exists a family {σ k : k ≥ 1} of adapted random fields such that for every T > 0 and k ≥ 1 sup |σ k (t, x)| < ∞,

P-a.s.,

t,x≤T ∞ k=1

and

x 0

+

∞ k=1

t 0

T 0

T

σ 2k (t, x)d xdt < ∞,

P-a.s.,

0

t

r (t, u)du +

x+t

r (s, 0)ds =

0

∞ 1 σ k (s, x + t − s)dWk (s) + 2 k=1

r (0, u)du

0

t 0

σ 2k (s, x + t − s)ds.

316

B. Goldys and M. Musiela

Proof For every T > 0 the process N (·, T ) is continuous and positive. Fix R > 0 and define the process N for all t ≥ 0 and T ∈ [0, R] putting N (t, T ) = N (T, T ) for t ≥ T . Then for every T ≤ R the process {N (t, T ) : t ≤ R} is a continuous square integrable martingale. Therefore, for every T > 0 there exists a continuous local martingale M(·, T ) with M(0, T ) = 0 such that 1 N (t, T ) = P(0, T ) exp −M(t, T ) − .M(·, T )/t , T ≤ R, 2 and M(t, T ) = M(T, T ) for t ≥ T . By (1.5) M(t, ·) is a L 2 (0, R)-valued continuous martingale for every R > 0. It follows from Theorem 8.2 in Da Prato and Zabczyk (1992) that there exists a family {h k : k ≥ 1} of predictable L 2 (0, R)valued processes, such that for t, T ≤ R ∞ t M(t, T ) = h k (s, T )dWk (s) k=1

and E

∞ k=1

R 0

t

0

h 2k (s, T )dT ds < ∞.

0

It is easy to see that the processes h k , k ≥ 1, may be chosen independently of R. Hence, for t, x ≥ 0 we may define σ k (t, x) = h k (t, x + t) and then t+x r (0, u)du N (t, x + t) = exp − −

∞ k=1

0

0

t

∞ 1 σ k (s, x + t − s)dWk (s) − 2 k=1

t

σ 2k (s, x

+ t − s)ds

0

and the theorem follows. In the sequel we assume that for each x ≥ 0 dr (t, x) = g(t, x)dt +

∞

τ k (t, x)dWk (t).

(1.6)

k=1

The random fields {g(t, x) : t, x ≥ 0} and {τ k (t, x) : t, x ≥ 0}, k ≥ 1, satisfy the following conditions. (C1) For every T > 0 sup |g(t, x)| < ∞,

P-a.s.,

t,x≤T

and for every T > 0 and k ≥ 1 sup |τ k (t, x)| < ∞ P-a.s. t,x≤T

9. Kolmogorov Equations and Interest Rate Models

(C2) For every T > 0

∞

T

0

k=1

T

τ 2k (t, x)d xdt < ∞.

317

P-a.s.

0

(C3) For every t > 0 σ (g(s, x) : s ≤ t, x ≥ 0) ∪ σ (τ k (s, x) : s ≤ t, x ≥ 0, k ≥ 1) ⊂ Ft . (C4) σ {r (0, x) : x ≥ 0} ∈ F0 and for every T > 0 sup |r (0, x)| < ∞. x≤T

Theorem 1.2 Assume that for all t, x ≥ 0 2 x ∞ T −t 1 g(t, u)du = r (t, x) − r (t, 0) + τ k (t, u)du . 2 k=1 0 0

(1.7)

Then for all T > 0 the process MT (t) =

P(t, T ) , β(t)

t ∈ [0, T ],

is a P-local martingale and a P-martingale, if in addition the process {r (t, x) : t, x ≥ 0} is bounded on [0, T ] × for all T > 0. Proof We have

T −t

d log P(t, T ) = −d

r (t, u)du

0

T −t

= r (t, T − t)dt −

# g(t, u)du +

0

= r (t, T − t)dt − −

k=1

T −t

$ τ k (t, u)dWk (t) du

k=1

g(t, u)du dt 0

∞

T −t

∞

τ k (t, u)du dWk (t).

0

Hence, the quadratic variation of log P(·, T ) is given by 2 ∞ T −t τ k (t, u)du dt. d .log P(·, T )/ (t) = k=1

0

d P(t, T ) = P(t, T ) r (t, T − t) −

T −t

Therefore,

0

g(t, u)du

318

B. Goldys and M. Musiela

+

∞ 1 2 k=1

T −t

τ k (t, u)du

2 ∞ dt − P(t, T )

0

k=1

T −t

τ k (t, u)dWk (t).

0

The last equation yields ∞ t T −s P(t, T ) = P(0, T ) exp − τ k (s, u)du dWk (s) β(t) 0 k=1 0 2 ∞ t T −s 1 − τ k (s, u)du ds 2 k=1 0 0

(1.8)

which concludes the proof. Remark 1.3 The above theorem has been proved in Musiela (1993) for the finite dimensional Wiener process, that is for a certain d ≥ 1, τ k = 0 for k > d. An extension to the case when the number of driving Wiener processes is infinite has been proposed in Santa-Clara and Sornette (1997). We will reparametrize equation (1.8) putting T = t + x. Since t+x P(0, t + x) = exp − r (0, u)du , 0

we find that (1.8) takes the form t+x P(t, t + x) r (0, u)du = exp − β(t) 0 ∞ t t+x−s · exp − τ k (s, x)d x dWk (s) 1 − 2

0 k=1 0 ∞ t t+x−s k=1

0

2 τ k (s, x)d x

ds .

(1.9)

0

Under the appropriate regularity conditions on the coefficients τ k we obtain formally from (1.9) x+t−s ∞ t r (t, x) = r (0, t + x) + τ k (s, x + t − s) τ k (s, u + t − s)du ds k=1

+

0

0

∞ k=1

t

τ k (s, x + t − s)dWk (s).

(1.10)

0

If we assume that τ k (s, x) = f k (r (u, y) : u ≤ s, y ≥ 0) (x) for k ≥ 1 then (1.10) defines a stochastic integral equation for the random field {r (t, x) : t, x ≥ 0}. Such an approach has been studied in Kennedy (1994) and Hamza and Klebaner (1995).

9. Kolmogorov Equations and Interest Rate Models

319

In this chapter we take another approach, well known in the theory of stochastic partial differential equations. We will transform (1.10) into a a stochastic evolution equation in an appropriate function space. To this end we define first a scale of weighted L 2 -spaces in the following way. First, we assume that for every t ≥ 0 the forward curve r (t, x) is defined for all x ≥ 0. Hence, the state of the forward rate process r (t) at time t is is the curve {r (t, x) : x ≥ 0}. In order to allow bounded, for example constant forward rates, we assume that for a certain α > 0 ∞ r 2 (t, x)e−αx d x < ∞ P − a.s. 0

It follows that a state space for the process {r (t) : t ≥ 0} is the space L 2α (0, ∞) of functions with the finite norm ∞ f 2 (x)e−αx d x. - f -2α = 0

The space

L 2α (0, ∞)

is a Hilbert space with the inner product ∞ . f, g/α = f (x)g(x)e−αx d x. 0

For f ∈ L 2α (0, ∞) we define the semigroup of left shifts S(t) f (·) = f (t + ·),

t ≥ 0.

Then (1.10) may be rewritten as r (t) = S(t)r0 +

+

∞

·

S(t − s)τ k (s)

k=1

0

∞

t

k=1

t

τ k (s, u)du ds

0

S(t − s)τ k (s)dWk (s).

0

We will restrict our considerations to the class of forward rate processes defined by the Markovian dynamics on L 2α (0, ∞), that is we assume that τ k (s) = τ k (s, r (s))(·) ∈ L 2α (0, ∞), where the same notation τ k is preserved. Then · ∞ t r (t) = S(t)r0 + S(t − s)τ k (s, r (s)) τ k (s, r (s))(u)du ds k=1 0 ∞ t

+

k=1

0

0

S(t − s)τ k (s, r (s))dWk (s).

(1.11)

320

B. Goldys and M. Musiela

Let τ : L 2α (0, ∞) → R be defined by the formula x ∞ τ k (t, f )(x) τ k (t, f )(u)du. G(t, f )(x) = 0

k=1

where G : L 2α (0, ∞) → L 2α (0, ∞) and τ (t, f ) =

∞

τ k (t, f (t))ek

k=1

Let {ek : k ≥ 1} be a complete orthonormal system in L 2α (0, ∞). We denote by W (t) =

∞

Wk (t)ek ,

t ≥ 0,

k=1

the standard cylindrical Wiener process on L 2α (0, ∞). By this we mean that W is a process of continuous random functionals on L 2α (0, ∞) with the properties: .W (t), f / ∼ N 0, t - f -2 , t ≥ 0, f ∈ L 2α (0, ∞), E .W (t), f / .W (s), g/ = . f, g/ min(s, t). Then, (1.11) takes the form of the following integral equation in L 2α (0, ∞) t t S(t − s)G(s, r (s))ds + S(t − s)τ (s, r (s))dW (s). (1.12) r (t) = S(t)r0 + 0

0

Definition 1.4 The L 2α (0, ∞)-valued (Ft )-predictable process r is a solution to (1.12) with the initial condition r0 ∈ L 2α (0, ∞) if (a) for all t ≥ 0 t ∞ t -G(s, r (s))- ds + -τ (s, r (s))-22 < ∞, P-a.s., 0

k=1

0

where -τ (s, r (s))-22 =

∞

-τ (s, r (s))-2 .

k=1

(b) for every t ≥ 0 equation (1.12) holds P-a.s. In the theorem below we use the general theory of equations of type (1.12) developed in Da Prato and Zabczyk (1992) to provide conditions for existence and uniqueness of solutions to (1.12).

9. Kolmogorov Equations and Interest Rate Models

321

Theorem 1.5 Assume that piecewise continuous functions τ k : R+ × R → R+ , k ≤ d satisfy the following conditions: for every T > 0 there exists C T > 0 such that sup τ k (t, x) < ∞ x≥0,t≤T

|τ k (t, x) − τ k (t, y)| ≤ C T |x − y|,

t ≤ T.

Then for every α > 0 there exists a unique solution to (1.12) for every r0 ∈ L 2α (0, ∞). Remark 1.6 The above theorem does not assure positivity of forward rates. If we assume that r0 ≥ 0 then under appropriate conditions on τ k one may obtain existence and uniqueness of nonnegative solutions. We do not pursue this topic here. For an example of equation (1.12) with nonnegative solutions see Goldys et al. (1995). It is well known that equation (1.10) is intimately related to a stochastic partial differential equation # $ x ∞ ∂r (t, x) + dr (t, x)(t, x) = τ k (t, r (t, x)) τ k (t, r (t, y))dy dt ∂x 0 k=1 ∞ τ k (t, r (t, x))dWk (t), + k=1 r (0, x) = r (x). 0

(1.13) We will discuss this relationship at the level of the evolution equation (1.12). In the space L 2α (0, ∞) we introduce an operator A = ∂∂x with the domain " 6 2 ∞ ∂ f −αx 1 2 (x) e d x < ∞ , dom(A) = Hα (0, ∞) = f ∈ L α (0, ∞) : ∂x 0 where the derivative is meant in the generalized sense. Equation (1.13) considered in L 2α (0, ∞) takes the form dr (t) = (Ar (t) + G(t, r (t))) dt + τ (t, r (t))dW (t), (1.14) r (0) = r0 . The latter equation, however, does not need to have classical solutions unless further regularity conditions are imposed on the data (see below). In general we define a solution to (1.14) in the mild sense as a solution to (1.12). The relationship between the two equations is clarified by the next theorem, which follows from the general theory developed in Da Prato and Zabczyk (1992).

322

B. Goldys and M. Musiela

Theorem 1.7 Assume that the functions τ k , k ≤ d, satisfy assumptions of theorem 1.5 and let r be a solution to (1.12). Then the following holds. (i) Equation (1.13) holds x-a.e. if and only if τ k (t, ·) ∈ Hα1 for all t ≥ 0 and r0 ∈ Hα1 . (ii) There exist sequences τ nk (t, ·) , r0n ⊂ Hα1 , k ≤ d converging in the L 2α (0, ∞)-norm to τ k (t, ·) and r0 respectively and such that the corresponding solutions of (1.13) satisfy the condition T n r (t) − r (t)2 dt = 0. lim E α n→∞

0

Proof The standard proof of this theorem is omitted.

2 The BGM Model In this section our starting point is the model of Libor rate process proposed in Brace et al. (1997). Let L(t, x) denote the Libor rate process defined by the formula 1 + δL(t, x) =

P(t, t + x) , P(t, t + x + δ)

t, x ≥ 0,

where δ > 0 (for example δ = 0.25) is fixed. We assume that all zero coupons may be expressed in terms of a certain forward rate process r given in (1.2) but we shift our attention to the process log L(t, x) which is supposed to satisfy an equation d (log L(t, x)) = α(t, x)dt + γ (t, x)dW (t),

x ≥ 0,

(2.1)

W is a d-dimensional Wiener process. We need conditions on the drift term α which assure that there is no arbitrage. We assume that the measurable function γ : [0, ∞) × [0, ∞) → Rd is deterministic, ∞ Mγ = sup |γ (t, x)| + sup |γ (t, x + kδ)| < ∞. (2.2) t,x>0

t≥0,x≤δ k=0

Let l be a solution to the following stochastic evolution equation in L 2α (0, ∞): dl(t) = (Al(t) + F(t, l(t)))dt + γ (t)dW (t), (2.3) l(0) = φ ∈ L 2α (0, ∞), where F(t, φ)(x) =

[x/δ] k=0

δ exp (φ(x − kδ)) 1 .γ (t, x − kδ), γ (t, x)/ − |γ (t, x|2 . 1 + δ exp (φ(x − kδ)) 2

9. Kolmogorov Equations and Interest Rate Models

323

If this equation has a solution then we may define the process L via the formula l(t, x) = log L(t, x). In turn (2) allows us to define the family of zero coupons and finally the forward rate process r (t) can be defined provided the appropriate regularity conditions are satisfied. It was shown in Brace et al. (1997) that if l is a solution to (2.3) then the corresponding process of forward rates satisfies the nonarbitrage condition (1.5). Theorem 2.1 Assume (2.1). Then the following holds. (a) For every α > 0 there exists a unique solution to (2.3) in the space L 2α (0, ∞). (b) Let α ≤ 0 and ∞ 2 Nγ = sup e−αx |γ (t, x)|2 d x < ∞. (2.4) t≥0

0

Then there exists a unique solution to (2.3) in L 2α (0, ∞). Proof Note first that |F(t, φ)(x)| ≤ |γ (t, x)|

[x/δ]

|γ (t, x − kδ)| +

k=0

1 |γ (t, x)|2 2

and therefore ∞ e−αx |F(t, φ)(x)|2 d x 0

≤2

∞

e

−αx

|γ (t, x)|

0

≤2

∞

2

#[x/δ]

$2 |γ (t, x − kδ)|

k=0

e−αδn

n=0

1 + Mγ2 2

δ

#

|γ (t, x + nδ)|2

0

∞

n

dx +

|γ (t, x + kδ)|

1 ∞ −αx |γ (t, x)|4 d x e 2 0 $2 dx

k=0

e−αx |γ (t, x)|2 d x.

(2.5)

0

Therefore, for α > 0 -F(t, φ)-2 ≤ 2δ Mγ4

∞ n=0

n 2 e−αδn +

1 4 M < ∞. 2α γ

If α ≤ 0 then (2.3), (2.4) and (2.5) yield -F(t, φ)-2 ≤

3 2 M -γ (t)-2 . 2 γ

324

B. Goldys and M. Musiela

Hence, for every α ∈ R the mapping F : [0, ∞) × L 2α (0, ∞) → L 2α (0, ∞) is uniformly bounded. We will show now that -F(t, φ) − F(t, ψ)- ≤ M F -φ − ψ- , Since

φ, ψ ∈ L 2α (0, ∞).

(2.6)

x y e 1 e 1 + e x − 1 + e y ≤ 2 |x − y|,

we obtain, proceeding similarly as in (2.5), δ ∞ 1 2 −αδn -F(t, φ) − F(t, ψ)- ≤ |γ (t, x + nδ)|2 e 4 n=0 0 $2 # n |γ (t, x + kδ)| |(φ − ψ)(x + kδ)| d x k=0 ∞ 1 2 e−αδn ≤ Mγ 4 n=0

δ

|γ (t, x + nδ)|2

# n

0

$ (φ − ψ)2 (x + kδ) d x. (2.7)

k=0

Hence, if α < 0 then

$ δ # ∞ n 1 -F(t, φ) − F(t, ψ)-2 ≤ Mγ4 e−αδn (φ − ψ)2 (x + kδ) d x 4 0 n=0 k=0 δ ∞ ∞ 1 = Mγ4 (φ − ψ)2 (x + kδ) e−αδn 4 0 k=0 n=k ∞ δ Mγ4 = e−αδk (φ − ψ)2 (x + kδ) 4 1 − e−αδ k=0 0 ∞ (k+1)δ Mγ4 αδ e e−αx (φ − ψ)2 (x)d x ≤ −αδ 4 1−e k=0 kδ =

Mγ4

4 1 − e−αδ

eαδ -φ − ψ-2

and (2.6) follows. Assume now that α ≤ 0. Then by the first inequality in (2.7) δ ∞ 1 -F(t, φ) − F(t, ψ)-2 ≤ |γ (t, x + nδ)|2 e−αδn 4 n=0 0 $2 # n |γ (t, x + kδ)| |(φ − ψ)(x + kδ| d x k=0

1 ≤ Nγ2 4

δ # ∞ 0

k=0

$# |γ (t, x + kδ)|2

∞ k=0

$ e−αδk (φ − ψ)2 (x + kδ) d x

9. Kolmogorov Equations and Interest Rate Models

≤

1 4 N 4 γ

∞ k=0

(k+1)δ

e−αx (φ − ψ)2 (x)d x =

kδ

325

1 4 N -φ − ψ-2 . 4 γ

Finally, Theorem 7.4 in Da Prato and Zabczyk (1992) yields existence of a unique solution to equation (2.3).

3 Kolmogorov equations The classical Black–Scholes formula for a European option price has been derived by solving a partial differential equation identified by means of heuristic arguments (cf. Black and Scholes 1973). Later on a probabilistic interpretation of the above arguments allowed the derivation to be made rigorous Harrison and Pliska (1981). Let us recall briefly the main ideas of this approach. Assume that the price X (t) of a stock is a positive continuous semimartingale such that the logarithm of the stock price has a deterministic quadratic variation .log X /t = σ 2 t. Then some mild technical conditions imply existence of a unique probability measure under which for every t ≥ 0 t t r X (s) ds + σ X (s) dW (s). X (t) = X 0 + 0

0

Moreover, for a given maturity T and a strike price K we can calculate the price of a European put option by taking the conditional expectation of the discounted option payoff, i.e., VT (t, x) = e−r (T −t) E (K − X (T ))+ |X (t) = x for t ≤ T . Since X is a strong Feller process with the infinitesimal generator ∂ ∂2 1 + σ 2x 2 2 ∂x 2 ∂x we can apply the Feynman–Kac formula and identify the function VT with a unique solution of the backward Kolmogorov equation L = rx

∂ 2u 1 ∂u ∂u (t, x) + σ 2 x 2 2 (t, x) + r x (t, x) − r u(t, x) = 0 ∂t 2 ∂x ∂x

(3.1)

with the terminal condition u(T, x) = (K − x)+ . In this section we investigate whether this strategy can be applied to interest rate options in general term structure models. Consider a European swaption, an option with maturity T on a swap with the cashflows C i , i = 1, . . . , n at times Ti , i = 1, 2, . . . , n such that T < T1

0 sup E-r (t, ζ )- p ≤ C T, p 1 + E-ζ - p . t≤T

If τ (t, ·) is Fr´echet differentiable on L 2α (0, ∞) then for every t ≥ 0 the mapping φ → r (t, φ) is Fr´echet differentiable P-a.s. In general the solution to (3.5) is not a

9. Kolmogorov Equations and Interest Rate Models

327

semimartingale but for every ψ ∈ dom (A∗ ) = φ ∈ H 1 : φ(0) = 0 t t .F(s, r (s)), ψ/ ds r (s), A∗ ψ ds + .r (t), ψ/ = .φ, ψ/ + 0

t

+

0

G ∗ (s, r (s))ψ, dW (s)

(3.3)

0

and hence .r (t), ψ/ is a semimartingale and so is the multidimensional process r (t), ψ 1 , . . . , r (t), ψ n for any n and arbitrary collection of ψ 1 , . . . , ψ n ∈ dom (A∗ ). It follows that the process r is an L 2 ([0, T ] × , λ ⊗ P)-limit of semimartingales for every T > 0. This property will be used later on in the discussion of the Kolmogorov equation. The following property of the process t R(t, φ) = r (s, φ) ds 0

will be useful. Lemma 3.1 For every T > 0 there exists cT > 0 such that sup E-(R(t, φ) − R(t, ψ)-1 ≤ cT -φ − ψ-. t≤T

Proof The standard proof of this lemma is omitted. Let us go back now to the problem of pricing interest rate dependent options. To begin with, note that in the present terminology the price of zero coupon can be rewritten as follows. Let BT (t, φ) = e.φ,S(t)I[0,T ] / , with I[0,T ] denoting the indicator function of the interval [0, T ]. It follows that P(t, T ) = BT (t, r (t)). Any measurable mapping F : L 2α (0, ∞) → R such that T sup E |F(r (T ))| exp − r (u, 0) du 0 and a>0 t δ(r (s, φ)) ds < ∞. sup E exp 2 p -r (t, φ)- − 2 -φ-≤a

0

If r (t, φ) ∈ H 1 for every t ≥ 0 and φ ∈ H 1 then we will need a H 1 - version of (A): (A ) We assume α = 0. Moreover, there exists p ≥ 0 such that for every t > 0 and a>0 t δ(r (s, φ)) ds < ∞. sup E exp 2 p -r (t, φ)-1 − 2 -φ-≤a

0

9. Kolmogorov Equations and Interest Rate Models

329

We will show that (A ) holds if r is a Gaussian process. If the process r is nonnegative then the results presented are valid and the assumption (A) is not below

t needed. In general the term exp − 0 δ(r (s, φ)ds can grow exponentially. Proposition 3.2 If (A) holds for a certain p ≥ 0 then putting H = L 2 (0, ∞), Ptδ C p (H ) ⊂ C (H ) and Ptδ C p H 1 ⊂ C H 1 for every t ≥ 0. Proof We provide the proof for H 1 only. Let F ∈ C p H 1 and let φ n ⊂ H 1 be a sequence converging in H 1 to φ. Then F(φ) = e− p-φ-1 G(φ) with G ∈ C0 H 1 and t δ P F(φ) ≤ -G-0 E exp p -r (t, φ)-1 − δ(r (u, φ) )du . t 0

Ptδ F(φ)

Hence in view of (A ) is well defined. Moreover, (A ) yields uniformly integrability of the family of random variables

t δ(r (u, φ) )du : -φ- ≤ a exp p -r (t, φ)-1 − 0

for every a > 0. Hence the proposition follows from the continuity of F and Lemma (3.3). Remark 3.3 The above theorem may be proved for any α ∈ R. However, the Kolmogorov equation we are going to study next is simpler in L 2 (0, ∞). We shall identify the infinitesimal generator L of the Markov process r . Because the process r is not a semimartingale we can not apply the Itˆo formula to the function F(r (t, φ)) even if F ∈ C 2p (Hα ). However, it turns out that the property (3.3) is sufficient for our needs. Let ψ 1 , . . . , ψ n ∈ dom (A∗ ) and let Pn denote the orthogonal projection on the linear span Hn of the vectors ψ 1 , . . . , ψ n . First, let us define the space D0 = F ∈ C p (Hα ) : F = f ◦ Pn , f ∈ C 2p Rn , n = 1, . . . . If F ∈ D0 then in view of (3.3) the process F(r (t, φ)) is a semimartingale and t t L F(r (s, φ)) ds + D F(r (s, φ))τ (r (s, φ))dW (s), F(r (t, φ)) = F(φ) + 0

0

(3.6)

where L F(φ) =

1 2 D F(φ)τ (φ), τ (φ) + φ, A∗ D F(φ) + .G(φ), D F(φ)/. 2

If F ∈ D0 then the function A∗ D F(φ) is well-defined for all φ ∈ L 2 (0, ∞) and therefore L F(φ) is a well-defined continuous function on L 2 (0, ∞). The above

330

B. Goldys and M. Musiela

considerations show that the generator of the Markov process r coincides on D0 with the operator L. Therefore we can expect that VT as defined in (3.5) is a Feynman–Kac formula for the solution of the following equation ∂u (t, φ) + Lu(t, φ) − δ(φ)u(t, φ) = 0, ∂t (3.7) u(T, φ) = F(φ). In other words the operator L δ = L − δ when considered on an appropriate domain is a generator of the semigroup Ptδ . However, equation (3.7) is not valid in general because VT (t, ·) need not be differentiable. Proposition 3.4 Assume that τ and G are twice differentiable on H . Then for every F ∈ C 2p (H ) the function VT is a unique solution of the backward Kolmogorov equation (3.7) in the following sense. • The function VT : [0, ∞)× H → R is bounded and continuous with respect to each variable. • For every t ≥ 0 we have VT (t, ·) ∈ C 2 (H ). • We have VT ∈ C 1 ([0, T ], H 1 ). • Equation (3.7) holds for every φ ∈ dom (A) and t ≥ 0. Moreover, VT is given by (3.5). Proof Let δ n denote a sequence of C 2 functions on R such that .δ n , φ/ → δ(φ) for every continuous φ and let L n = L − δ n . If we denote by Ptn the semigroup t n Pt F(φ) = E exp − .δ n , r (u, φ)/ du F(r (t, φ)) 0

then by a simple modification of the proof of Theorem 9.17 in Da Prato and Zabczyk (1992) we can show, putting u n (t, φ) = Ptn F(φ), that n ∂u (t, φ) + Lu n (t, φ) − .δ n , φ/ u n (t, φ) = 0, ∂t (3.8) n u (T, φ) = F(φ), and moreover u n is a unique solution of (3.8). We shall show first that for every φ∈H lim Ptn F(φ) = Ptδ F(φ).

n→∞

Indeed, |Ptn F(φ)

−

t p-r (t,φ)- ≤ -F- p E e exp − .δ n , r (u, φ)/ du 0 t δ(r (u, φ)) du − exp −

Ptδ F(φ)|

0

(3.9)

9. Kolmogorov Equations and Interest Rate Models

331

and therefore (A) and the definition of δ n yield (3.9). Using (3.9) and Theorem 9.16 in Da Prato and Zabczyk (1992) we obtain easily that the right-hand side of (3.8) converges (along the subsequence n k ) to the expression L Ptδ F(φ) − δ(φ)Ptδ F(φ) for every φ ∈ Hα1 uniformly in t ≤ T . Hence ∂u n k ∂ Ptδ (t, φ) = (φ) k→∞ ∂t ∂t lim

and therefore Ptδ F satisfies (3.7). Unfortunately, this theorem has too strong assumptions to be applicable to some important contingent claims like swaptions. Stronger results can be obtained in the Gaussian case. Proposition 3.5 The mapping u is a solution of (3.7) if and only if u(t, φ) = BT (t, φ)RT (t, φ), RT (T, φ) = F(φ) and 1 ∂ RT (t, φ) + D 2 R T (t, φ)τ (φ), τ (φ) + .D RT (t, φ), Aφ + G(φ)/ ∂t 2 − .D RT (t, φ), τ (φ)/ τ (φ), S(t)I[0,T ] = 0, (3.10) where the solution is defined in the sense of Proposition 3.4. Proof Let u satisfy (3.7) and define the function RT by the formula u(t, φ) = BT (t, φ)RT (t, φ). Then RT is smooth and ∂u ∂ RT (t, φ) = φ(T − t)BT (t, φ)RT (t, φ) + BT (t, φ) (t, φ), ∂t ∂t Du(t, φ) = −BT (t, φ)RT (t, φ)S(t)I[0,T ] + BT (t, φ)D RT (t, φ), D 2 u(t, φ) = BT (t, φ)RT (t, φ) S(t)I[0,T ] ⊗ S(t)I[0,T ] −2BT (t, φ)D RT (t, φ) ⊗ S(t)I[0,T ] + BT (t, φ)D 2 RT (t, φ).

(3.11) (3.12) (3.13)

Hence by (3.12) .Du(t, φ), Aφ + G(φ)/ = −BT (t, φ)R T (t, φ) 2 1 φ(T − t) − φ(0) + S(t)I[0,T ] , τ (φ) 2

(3.14)

and by (3.13) 2 2 D u(t, φ)τ (φ), τ (φ) = BT (t, φ)R T (t, φ) S(t)I[0,T ] , τ (φ) − 2BT (t, φ) .D RT (t, φ), τ (φ)/ S(t)I[0,T ] , τ (φ) (3.15) + BT (t, φ) D 2 RT (t, φ)τ (φ), τ (φ) .

332

B. Goldys and M. Musiela

Finally, taking into account (3.11), (3.14) and (3.15) we find that 1 ∂u (t, φ) + D 2 u(t, φ)τ (φ), τ (φ) + .Du(t, φ), Aφ + G(φ)/ − δ(φ)u(t, φ) ∂t 2 ∂ RT 1 = BT (t, φ) (t, φ) + D 2 R T (t, φ)τ (φ), τ (φ) + .D RT (t, φ), Aφ + G(φ)/ ∂t 2 − .D R T (t, φ), τ (φ), τ (φ)/ S(t)I[0,T ] , τ (φ) and (3.10) follows. Using similar arguments we show that if RT satisfies (3.10) then u(t, φ) = BT (t, φ)RT (t, φ) is a solution to (3.7). Remark 3.6 The proposition 3.5 describes the forward measure transformation performed at the level of the Kolmogorov equation. Note that equation (3.10) is the Kolmogorov equation for the process Y (say) defined as a solution to the stochastic differential equation dY = (AY + G σ (Y ) − .τ (Y ), S(t)I T / τ (Y )) dt + τ (Y )dW or in a more explicit form x ∂Y (t, x) + τ (Y (t))(x) dY (t, x) = τ (Y (t))(u) du dt ∂x 0 T −t τ (Y (t))(u) dudt + τ (Y (t))(x)dW (t). − τ (Y (t))(x) 0

From this point on we assume that τ ∈ H is a constant vector and therefore t t r (t) = S(t)φ + S(s)G ds + S(t − s)τ dW (s). 0

0

This case has been discussed in Musiela (1993) and Brace and Musiela (1994). For every t ≥ 0 the random variable r (t) is Gaussian with the mean t Er (t) = S(t)φ + S(s)G ds 0

and the covariance operator Qt =

t

S(s)τ τ ∗ S ∗ (s) ds.

0

Moreover, because r (t, φ) is Gaussian so is R(t, φ)(0). Hence, using the H¨older inequality we check by direct calculations that for t ≤ T E exp 2 p -r (t, φ)-α − 2R(t, φ)(0) ≤ C T exp β T -φ-

9. Kolmogorov Equations and Interest Rate Models

333

for some constants C T , β T > 0. Therefore (A) holds. In the present framework equation (3.7) may be written in the form ∂u (t, φ) = 12 D 2 u(t, φ)τ , τ + .Aφ + G(φ), Du(t, φ)/ − δ(φ)u(t, φ), ∂t u(0, φ) = F(φ), φ ∈ dom (A). (3.16) We shall need the finite dimensional parabolic PDE n 1 ∂ 2h ∂h bi∗ (t)b j (t)xi x j (t, x1 , . . . , x n ) = 0 (t, x1 , . . . , x n ) + ∂t 2 i, j=1 ∂ xi ∂ x j

(3.17)

with the terminal condition h (T, x1 , . . . , x n ) = h 0 (x 1 , . . . , x n ) and Ti −t T j −t ∗ ∗ bi (t)b j (t) = τ (x) d x τ (x) d x. T −t

T −t

Equation (3.17) has a unique solution for every measurable terminal condition h 0 with linear growth. Let FT,Ti (t, φ) = exp − S(t)IT,Ti , φ , where IT,Ti is an indicator function of the interval [T, Ti ]. Theorem 3.7 If the function U (t, x 1 , . . . , xn ) is a solution to (3.17) with the terminal condition U0 (x1 , . . . , x n ) then the function u(t, φ) = BT (t, φ)U t, FT,T1 (t, φ), . . . , FT,Tn (t, φ) is a solution to the Cauchy problem (3.6) with the terminal condition u(T, φ) = U0 BT1 (T, φ), . . . , BTn (T, φ) . Proof It is enough to consider the case n = d = 1. The general argument is exactly the same. In view of Proposition 3.5 we need to show that the function (3.18) R(t, φ) = U t, FT,T1 (t, φ), . . . , FT,Tn (t, φ) is a solution to equation (3.10). Note first that d FT,T1 (t, φ) = (φ (T1 − t) − φ (T − t)) FT,T1 (t, φ), dt D FT,T1 (t, φ) = −FT,T1 (t, φ)lt with lt = I[T −t,T1 −t] and D 2 FT,T1 (t, φ) = FT,T1 (t, φ)lt ⊗ lt .

334

B. Goldys and M. Musiela

Hence, denoting l = I[0,T −t] we find that for φ ∈ dom (A) ∂U ∂R (t, φ) = t, FT,T1 (t, φ) ∂t ∂t + FT,T1 (t, φ)(φ(T1 − t) − φ(T − t))

∂U t, FT,T1 (t, φ) (3.19) ∂x

and D R(t, φ) = −FT,T1 (t, φ)

∂U t, FT,T1 (t, φ) lt . ∂x

Hence .D R(t, φ), τ / = −FT,T1 (t, φ)

∂U t, FT,T1 (t, φ) .lt , τ / ∂x

(3.20)

and .D R(t, φ), Aφ + G σ / = −FT,T1 (t, φ)

∂U t, FT,T1 (t, φ) .lt , Aφ + G σ / ∂x

∂U t, FT,T1 (t, φ) (φ (T1 − t) − φ (T − t)) ∂x 2 x T1 −t 1 d ∂U τ (u) du d x t, FT,T1 (t, φ) − FT,T1 (t, φ) ∂x T −t 2 d x 0 ∂U = −FT,T1 (t, φ) t, FT,T1 (t, φ) (φ (T1 − t) − φ (T − t)) ∂x # 2 T −t 2 $ T1 −t ∂U 1 . τ (u) du − τ (u) du − FT,T1 (t, φ) t, FT,T1 (t, φ) 2 ∂x 0 0 = −FT,T1 (t, φ)

Thereby ∂U t, FT,T1 (t, φ) (φ (T1 − t) − φ (T − t)) ∂x 1 ∂U t, FT,T1 (t, φ) .τ , l/2 − FT,T1 (t, φ) 2 ∂x ∂U t, FT,T1 (t, φ) .τ , l/ .τ , lt / . −FT,T1 (t, φ) (3.21) ∂x

.D R(t, φ), Aφ + G/ = −FT,T1 (t, φ)

Next ∂U t, FT,T1 (t, φ) lt ⊗ lt ∂x ∂ 2U 2 + FT,T (t, φ) (t, FT (t, φ)) lt ⊗ lt . 1 ∂x2

D 2 R(t, φ) = FT,T1 (t, φ)

9. Kolmogorov Equations and Interest Rate Models

335

Hence ∂U D 2 R(t, φ)τ , τ = FT,T1 (t, φ) t, FT,T1 (t, φ) .lt , τ /2 ∂x 2 U ∂ 2 +FT,T (t, φ) 2 t, FT,T1 (t, φ) .lt , τ /2 . 1 ∂x Now, taking into account (3.19), (3.20), (3.21) and (3.22) we find that

(3.22)

∂R 1 (t, φ) + D 2 RT (t, φ)τ (φ), τ (φ) + .D RT (t, φ), Aφ + G σ (φ)/ ∂t 2 − .D RT (t, φ), τ (φ)/ .τ (φ), S(t)IT / 1 2 ∂ 2U ∂U t, FT,T1 (t, φ) .lt , τ /2 , (t, φ) t, FT,T1 (t, φ) + FT,T 1 2 ∂t 2 ∂x where R(t, φ) is defined by (3.18). Therefore, by (3.17) the function R satisfies equation (3.10) and the theorem follows. =

References Black, F. and Scholes, M. (1973), The pricing of options and corporate liabilities, J. Political Economy 81 637–59 Brace, A., Ga¸ tarek, D. and Musiela, M. (1997), The market model of interest rate dynamics, Math. Finance 7 127–54 Brace, A. and Musiela, M. (1994), A multifactor Gauss–Markov implementation of Heath, Jarrow and Morton, Mat. Finance 2 259–83 Da Prato, G. and Zabczyk, J. (1992), Stochastic equations in infinite dimensions, Cambridge University Press Goldys, B., Musiela, M. and Sondermann, D. (1995), Lognormality of rates and term structure models, preprint, UNSW ´ ¸ ch, A. (1997), Optimal stopping in Hilbert spaces and pricing of Ga¸ tarek, D. and Swie American options, a preprint Hamza, K. and Klebaner, F.C. (1995), A stochastic partial differential equation for term structure of interest rates, a preprint Harrison, J.M. and Pliska, S.R. (1981), Martingales and stochastic integrals in the theory of continuous trading, Stochastic Process. Appl. 11 215–60 Heath, D. Jarrow, R. and Morton, A. (1992), Bond pricing and the term structure of interest rates: a new methodology, Econometrica 61(1) 77–105 Kennedy, P.D. (1994), The term structure of interest rates as a Gaussian Markov field, Math. Finance 4 247–58 Musiela, M. (1993), Stochastic PDEs and term structure models, Journ´ees Internationales de Finance, IGR-AFFI, La Baule Santa-Clara, P. and Sornette, D. (1997), The dynamics of the forward interest rate curve with stochastic string shocks, preprint, UCLA

10 Modelling of Forward Libor and Swap Rates Marek Rutkowski

1 Introduction The last decade was marked by a rapidly growing interest in the arbitrage-free modelling of bond market. Undoubtedly, one of the major achievements in this area was a new approach to the term structure modelling proposed by Heath, Jarrow and Morton in their work published in 1992, commonly known as the HJM methodology. One of its main features is that it covers a large variety of previously proposed models and provides a unified approach to the modelling of instantaneous interest rates and to the valuation of interest-rate sensitive derivatives. Let us give a very concise description of the HJM approach (for a detailed account we refer, for instance, to Chapter 13 in Musiela and Rutkowski (1997a)). The HJM methodology is based on an exogenous specification of the dynamics of instantaneous, continuously compounded forward rates f (t, T ). For any fixed maturity T ≤ T ∗ , the dynamics of the forward rate f (t, T ) are d f (t, T ) = α(t, T ) dt + σ (t, T ) · dWt , where α and σ are adapted stochastic processes with values in R and Rd , respectively, and W is a d-dimensional standard Brownian motion with respect to the underlying probability measure P which plays the role of the real-world probability. More formally, for every fixed T ≤ T ∗ , where T ∗ > 0 is the horizon date, we have t t α(u, T ) du + σ (u, T ) · dWu f (t, T ) = f (0, T ) + 0

0

for some Borel-measurable function f (0, ·) : [0, T ∗ ] → R and stochastic processes applications α(·, T ) and σ (·, T ). Let us notice that, for any fixed maturity date T ≤ T ∗ , the initial condition f (0, T ) is determined by the current value of the continuously compounded forward rate for the future date T which prevails at time 0. In practical terms, the function f (0, T ) is determined by the current yield curve, 336

10. Modelling of Forward Libor and Swap Rates

337

which can be estimated on the basis of observed market prices of bonds (and other relevant instruments). Let us denote by B(t, T ) the price at time t ≤ T of a unit zero-coupon bond which matures at the date T ≤ T ∗ . In the present setup the price B(t, T ) can be recovered from the formula T B(t, T ) = exp − f (t, u) du . t

The problem of the absence of arbitrage opportunities in the bond market can be formulated in terms of the existence of a suitably defined martingale measure. It appears that in an arbitrage-free setting – that is, under the martingale measure – the drift coefficient α in the dynamics of the instantaneous forward rate is uniquely determined by the volatility coefficient σ , and a stochastic process which can be interpreted as the market price of the interest-rate risk. If we denote by P∗ the martingale measure for the bond market, and by W ∗ the associated standard Brownian motion, then d B(t, T ) = B(t, T ) rt dt + b(t, T ) · dWt∗ , where rt = f (t, t) is the short-term interest rate, and the bond price volatility b(t, T ) satisfies T σ (t, u) du. (1.1) b(t, T ) = − t

Furthermore, it appears that in the special case when the coefficient σ follows a deterministic function, the valuation formulae for interest rate-sensitive derivatives are independent of the choice of the risk premium. In this sense, the choice of a particular model from the broad class of HJM models hinges uniquely on the specification of the volatility coefficient σ . The HJM methodology appeared to be very successful both from the theoretical and practical viewpoints. Since the HJM approach to the term structure modelling is based on an arbitrage-free dynamics of the instantaneous continuously compounded forward rates, it requires a certain degree of smoothness with respect to the tenor of the bond prices and their volatilities. For this reason, working with such models is not always convenient. An alternative construction of an arbitrage-free family of bond prices, making no reference to the instantaneous rates, is in some circumstances more suitable. The first step in this direction was done by Sandmann and Sondermann (1993), who focused on the effective annual interest rate. This approach was further developed in ground-breaking papers by Miltersen et al. (1997) and Brace et al. (1997), who proposed to model instead the family of forward Libor rates. The main goal was to produce an arbitrage-free term structure model which would support the common

338

M. Rutkowski

practice of pricing such interest-rate derivatives as caps and swaptions through a suitable version of Black’s formula. This practical requirement enforces the lognormality of the forward Libor (or swap) rate under the corresponding forward martingale measure. It is interesting to notice that Brace et al. (1997) parametrize their version of the lognormal forward Libor model introduced by Miltersen et al. (1997) with a piecewise constant volatility function. They need to consider smooth volatility functions in order to analyse the model in the HJM framework, however. The backward induction approach to the modelling of forward Libor and swap rate developed in Musiela and Rutkowski (1997a) and Jamshidian (1997) overcomes this technical difficulty. In addition, in contrast to the previous papers, it allows also for the modelling of forward Libor (and swap) rates associated with accrual periods of differing lengths. It should be stressed that a similar (but not identical) approach to the modelling of market rate was developed in a series of papers by Hunt et al. (1996, 2000) and Hunt and Kennedy (1996, 1997). Since special emphasis is put here on the existence of the underlying low-dimensional Markov process that governs directly the dynamics of interest rates, this alternative approach is termed the Markov-functional approach. This property leads to a considerable simplification in numerical procedures associated with the model’s implementation. Another important feature of this approach is its ability of providing a perfect fit to market prices of a given family of interest-rate options.

2 Modelling of forward Libor rates In this section, we present various approaches to the modelling of forward Libor rates. We focus here on the model’s construction, its basic properties, and the valuation of the most typical derivatives. For further details, the interested reader is referred to the original papers: Musiela and Sondermann (1993), Sandmann and Sondermann (1993), Goldys et al. (1994), Sandmann et al. (1995), Brace et al. (1997), Jamshidian (1997), Miltersen et al. (1997), Musiela and Rutkowski (1997b), Rady (1997), Sandmann and Sondermann (1997), Rutkowski (1998, 1999), Glasserman and Kou (1999), and Yasuoka (1999). The issues related to the model’s implementation are extensively treated in Brace (1996), Andersen and Andreasen (1997), Sidenius (1997), Brace et al. (1998), Musiela and Sawa (1998), Hull and White (1999), Schl¨ogl (1999), Uratani and Utsunomiya (1999), Yasuoka (1999), Lotz and Schl¨ogl (2000), Glasserman and Zhao (2000), Brace and Womersley (2000), and Dun et al. (2000).

10. Modelling of Forward Libor and Swap Rates

339

2.1 Forward and futures Libor rates Our first task is to examine those properties of forward and futures contracts related to the notion of the Libor rate which are universal; that is, which do not rely on specific assumptions imposed on a particular model of the term structure of interest rates. To this end, we fix an index j, and we consider various interest-rate sensitive derivatives related to the period [T j , T j+1 ]. To be more specific, we shall focus in this section on single-period forward swaps – that is, forward rate agreements. We need to introduce some notation. We assume that we are given a prespecified collection of reset/settlement dates 0 < T0 < T1 < · · · < Tn = T ∗ , referred to as the tenor structure. Also, we denote δ j = T j − T j−1 for j = 1, . . . , n. We write B(t, T j ) to denote the price at time t of a T j -maturity zero-coupon bond. P∗ is the spot martingale measure, while for any j = 0, . . . , n we write PT j to denote the forward martingale measure associated with the date T j . The corresponding d-dimensional Brownian motions are denoted by W ∗ and W T j , respectively. Also, we write FB (t, T, U ) = B(t, T )/B(t, U ) so that FB (t, T j+1 , T j ) =

B(t, T j+1 ) , B(t, T j )

∀ t ∈ [0, T j ],

is the forward price at time t of the T j+1 -maturity zero-coupon bond for the settlement date T j . We use the symbol π t (X ) to denote the value (i.e., the arbitrage price) at time t of a European contingent claim X . Finally, we shall use the letter E for the Dol´eans exponential, for instance, · t 1 t ∗ ∗ 2 γ u · dWu = exp γ u · dWu − |γ u | du , Et 2 0 0 0 where the dot ‘ · ’ and | · | stand for the inner product and Euclidean norm in Rd , respectively. 2.1.1 Single-period swaps settled in arrears Let us first consider a single-period swap agreement settled in arrears; i.e., with the reset date T j and the settlement date T j+1 (multi-period interest rate swaps are examined in Section 3). By the contractual features, the long party pays δ j+1 κ and receives B −1 (T j , T j+1 ) − 1 at time T j+1 . Equivalently, he pays an amount Y1 = 1 + δ j+1 κ and receives Y2 = B −1 (T j , T j+1 ) at this date. The values at time t ≤ T j of these payoffs are π t (Y1 ) = B(t, T j+1 ) 1 + δ j+1 κ , π t (Y2 ) = B(t, T j ). The second equality above is trivial, since the payoff Y2 is equivalent to the unit payoff at time T j . Consequently, for any fixed t ≤ T j , the value of the forward

340

M. Rutkowski

swap rate, which makes the contract worthless at time t, can be found by solving for κ = κ(t, T j , T j+1 ) the following equation: π t (Y2 ) − π t (Y1 ) = B(t, T j ) − B(t, T j+1 ) 1 + δ j+1 κ = 0. It is thus apparent that κ(t, T j , T j+1 ) =

B(t, T j ) − B(t, T j+1 ) , δ j+1 B(t, T j+1 )

∀ t ∈ [0, T j ].

Note that the forward swap rate κ(t, T j , T j+1 ) coincides with the forward Libor rate L(t, T j ) which, by the market convention, is set to satisfy 1 + δ j+1 L(t, T j ) =

B(t, T j ) = E P T j+1 (B −1 (T j , T j+1 ) | Ft ) B(t, T j+1 )

(2.1)

for every t ∈ [0, T j ]. Let us notice that the last equality is a consequence of the definition of the forward measure PT j+1 . We conclude that in order to determine the forward Libor rate L(·, T j ), it is enough to find the forward price FX (t, T j+1 ) at time t of the contingent claim X = B −1 (T j , T j+1 ) in the forward contact that settles at time T j+1 . Indeed, it is well known (see, for instance, Musiela and Rutkowski (1997a)) that FX (t, T j+1 ) = B(t, T j+1 ) E PT j+1 (B −1 (T j , T j+1 ) | Ft ). Furthermore, it is evident that the process L(·, T j ) follows necessarily a martingale under the forward probability measure PT j+1 . Recall that in the Heath–Jarrow– Morton framework, we have, under PT j+1 , T (2.2) d FB (t, T j , T j+1 ) = FB (t, T j , T j+1 ) b(t, T j ) − b(t, T j+1 ) · dWt j+1 , where, for each maturity date T , the process b(·, T ) represents the price volatility of the T -maturity zero-coupon bond. On the other hand, if the process L(·, T j ) is strictly positive, it can be shown to admit the following representation1 T j+1

d L(t, T j ) = L(t, T j )λ(t, T j ) · dWt

,

where λ(·, T j ) is an adapted stochastic process which satisfies mild integrability conditions. Combining the last two formulae with (2.1), we arrive at the following fundamental relationship, which plays an essential role in the construction of the lognormal model of forward Libor rates, δ j+1 L(t, T j ) λ(t, T j ) = b(t, T j ) − b(t, T j+1 ), 1 + δ j+1 L(t, T j )

∀ t ∈ [0, T j ].

(2.3)

1 This representation is a consequence of the martingale representation property of the standard Brownian

motion.

10. Modelling of Forward Libor and Swap Rates

341

For instance, in the construction which is based on the backward induction, relationship (2.3) will allow us to determine the forward measure for the date T j , provided that PT j+1 , W T j+1 and the volatility λ(t, T j ) of the forward Libor rate L(·, T j−1 ) are known. (One may assume, for instance, that λ(·, T j ) is a prespecified deterministic function.) Recall that in the Heath–Jarrow–Morton framework2 the Radon–Nikod´ym density of PT j with respect to PT j+1 is known to satisfy · dPT j T j+1 b(t, T j ) − b(t, T j+1 ) · dWt = ET j . (2.4) dPT j+1 0 In view of (2.3), we thus have · dPT j δ j+1 L(t, T j ) T j+1 λ(t, T j ) · dWt = ET j . dPT j+1 0 1 + δ j+1 L(t, T j ) For our further purposes, it is also useful to observe that this density admits the following representation dPT j = cFB (T j , T j , T j+1 ) = c 1 + δ j+1 L(T j , T j ) , dPT j+1

PT j+1 -a.s.,

(2.5)

where c > 0 is the normalizing constant, and thus dPT j dPT j+1

= cFB (t, T j , T j+1 ) = c 1 + δ j+1 L(t, T j ) ,

PT j+1 -a.s.

|Ft

Finally, the dynamics of the process L(·, T j ) under the probability measure PT j are given by a somewhat involved stochastic differential equation δ j+1 L(t, T j )|λ(t, T j )|2 Tj dt + λ(t, T j ) · dWt . d L(t, T j ) = L(t, T j ) 1 + δ j+1 L(t, T j ) As we shall see in what follows, it is nevertheless not hard to determine the probability law of L(·, T j ) under the forward measure PT j – at least in the case of the deterministic volatility λ(·, T j ) of the forward Libor rate. 2.1.2 Single-period swaps settled in advance Consider now a similar swap which is, however, settled in advance – that is, at time T j . Our first goal is to determine the forward swap rate implied by such a contract. Note that under the present assumptions, the long party (formally) pays an amount Y1 = 1 + δ j+1 κ and receives Y2 = B −1 (T j , T j+1 ) at the settlement date T j (which coincides here with the reset date). The values at time t ≤ T j of these payoffs admit the following representations π t (Y1 ) = B(t, T j ) 1 + δ j+1 κ , π t (Y2 ) = B(t, T j )E PT j (B −1 (T j , T j+1 ) | Ft ). 2 See Heath et al. (1992) or Chapter 13 in Musiela and Rutkowski (1997a).

342

M. Rutkowski

The value κ = κ(t, ˆ T j , T j+1 ) of the modified forward swap rate, which makes the swap agreement settled in advance worthless at time t, can be found from the equality π t (Y2 ) − π t (Y1 ) = B(t, T j ) E PT j (B −1 (T j , T j+1 ) | Ft ) − (1 + δ j+1 κ) = 0. It is clear that

−1 κ(t, ˆ T j , T j+1 ) = δ −1 j+1 E P T j (B (T j , T j+1 ) | Ft ) − 1 .

˜ T j ) by We are in a position to introduce the modified forward Libor rate L(t, setting, for every t ∈ [0, T j ], −1 ˜ T j ) := δ −1 L(t, j+1 E P T j (B (T j , T j+1 ) | Ft ) − 1 . Let us make two remarks. First, it is clear that finding of the modified forward ˜ T j ) is formally equivalent to finding the forward price of the claim Libor rate L(·, −1 B (T j , T j+1 ) for the settlement date T j .3 Second, it is useful to observe that ˜L(t, T j ) = E PT 1 − B(T j , T j+1 ) Ft = E PT (L(T j , T j ) | Ft ). (2.6) j j δ j+1 B(T j , T j+1 ) In particular, it is evident that at the reset date T j the two kinds of forward Libor rates introduced above coincide, since manifestly ˜ j , T j ) = 1 − B(T j , T j+1 ) = L(T j , T j ). L(T δ j+1 B(T j , T j+1 ) To summarize, the “standard” forward Libor rate L(·, T j ) satisfies L(t, T j ) = E PT j+1 (L(T j , T j ) | Ft ),

∀ t ∈ [0, T j ],

with the initial condition L(0, T j ) =

B(0, T j ) − B(0, T j+1 ) . δ j+1 B(0, T j+1 )

˜ T j ) we have On the other hand, for the modified Libor rate L(·, ˜ j , T j ) | Ft ), ˜ T j ) = E PT ( L(T L(t, j

∀ t ∈ [0, T j ],

with the initial condition

−1 ˜ L(0, T j ) = δ −1 j+1 E P T j (B (T j , T j+1 )) − 1 .

The calculation of the right-hand side above involve not only on the initial term structure, but also the volatilities of bond prices (for more details, we refer to Rutkowski (1998)). 3 Recall that in the case of a forward Libor rate, the settlement date was T j+1 .

10. Modelling of Forward Libor and Swap Rates

343

2.1.3 Eurodollar futures contracts The next object of our studies is the futures Libor rate. A Eurodollar futures contract is a futures contract in which the Libor rate plays the role of an underlying asset. By convention, at the contract’s maturity date T j , the quoted Eurodollar futures price, denoted by E(T j , T j ), is set to satisfy E(T j , T j ) := 1 − δ j+1 L(T j , T j ). Equivalently, in terms of the zero-coupon bond price we have E(T j , T j ) = 2 − B −1 (T j , T j+1 ). From the general theory, it follows that the Eurodollar futures price at time t ≤ T j equals E(t, T j ) := E P∗ (E(T j , T j )) = 2 − E P∗ B −1 (T j , T j+1 ) | Ft (2.7) (recall that P∗ represents the spot martingale measure in a given model of the term structure). It is thus natural to introduce the concept of the futures Libor rate, associated with the Eurodollar futures contract, through the following definition. Definition 2.1 Let E(t, T j ) be the Eurodollar futures price at time t for the settlement date T j . The implied futures Libor rate L f (t, T j ) satisfies E(t, T j ) = 1 − δ j+1 L f (t, T j ),

∀ t ∈ [0, T j ].

(2.8)

It follows immediately from (2.7)–(2.8) that the following equality is valid: 1 + δ j+1 L f (t, T j ) = E P∗ B −1 (T j , T j+1 ) | Ft . (2.9) Equivalently, we have ˜ j , T j ) | Ft ). L f (t, T j ) = E P∗ (L(T j , T j ) | Ft ) = E P∗ ( L(T Note that in any term structure model, the futures Libor rate necessarily follows a martingale under the spot martingale measure P∗ (provided, of course, that P∗ is well-defined in this model). 2.2 Lognormal models of forward Libor rates We shall now describe alternative approaches to the modelling of forward Libor rates in a continuous- and discrete-tenor setups. 2.2.1 The Miltersen–Sandmann–Sondermann approach The first attempt to provide a rigorous construction of a lognormal model of forward Libor rates was done by Miltersen et al. (1997). The interested reader is referred also to Musiela and Sondermann (1993), Goldys et al. (1994), and Sandmann et al. (1995) for related previous studies. As a starting point in their

344

M. Rutkowski

approach, Miltersen et al. (1997) postulate that the forward Libor rates process L(·, T ) satisfies d L(t, T ) = µ(t, T ) dt + L(t, T )λ(t, T ) · dWt∗ , with a deterministic volatility function λ(·, T ) : [0, T ] → Rd . It is not difficult to deduce from the last formula that the forward price of a zero-coupon bond satisfies d F(t, T + δ, T ) = −F(t, T + δ, T ) 1 − F(t, T + δ, T ) λ(t, T ) · dWtT . Subsequently, they focus on the partial differential equation satisfied by the function v = v(t, x), which expresses the forward price of the bond option in terms of the forward bond price. It is interesting to note that the PDE (2.10) was previously solved by Rady and Sandmann (1994) who worked within a different framework, however.4 The PDE for the option’s price is ∂v 1 ∂ 2v + |λ(t, T )|2 x 2 (1 − x)2 2 = 0 ∂t 2 ∂x

(2.10)

with the terminal condition v(T, x) = (K − x)+ . As a result, Miltersen et al. (1997) obtained not only the closed-form solution for the price of a bond option (this was already achieved in Rady and Sandmann (1994)), but also the “market formula” for the caplet’s price. The rigorous approach to the problem of existence of such a model was presented by Brace et al. (1997), who also worked within the continuous-time Heath–Jarrow–Morton framework. 2.2.2 Brace–Ga¸ tarek–Musiela approach To formally introduce the notion of a forward Libor rate, we assume that we are given a family B(t, T ) of bond prices, and thus also the collection FB (t, T, U ) of forward processes. In contrast to the previous section, we shall now assume that a strictly positive real number δ < T ∗ , which represents the length of the accrual period, is fixed throughout. By definition, the forward δ-Libor rate L(t, T ) for the future date T ≤ T ∗ − δ prevailing at time t is given by the conventional market formula 1 + δL(t, T ) = FB (t, T, T + δ),

∀ t ∈ [0, T ].

(2.11)

The forward Libor rate L(t, T ) represents the add-on rate prevailing at time t over the future time interval [T, T + δ]. We can also re-express L(t, T ) directly in terms of bond prices, as for any T ∈ [0, T ∗ − δ], we have 1 + δL(t, T ) =

B(t, T ) , B(t, T + δ)

∀ t ∈ [0, T ].

(2.12)

4 In fact, they were concerned with the valuation of options on zero-coupon bonds for the term structure model

put forward by B¨uhler and K¨asler (1989).

10. Modelling of Forward Libor and Swap Rates

In particular, the initial term structure of forward Libor rates satisfies B(0, T ) L(0, T ) = δ −1 −1 . B(0, T + δ)

345

(2.13)

Given a family FB (t, T, T ∗ ) of forward processes, it is not hard to derive the dynamics of the associated family of forward Libor rates. For instance, one finds that under the forward measure PT +δ , we have d L(t, T ) = δ −1 FB (t, T, T + δ) γ (t, T, T + δ) · dWtT +δ , where PT +δ is the forward measure for the date T + δ, and the associated Wiener process W T +δ equals t T +δ ∗ Wt = Wt − b(u, T + δ) du, ∀ t ∈ [0, T + δ]. 0

Put another way, the process L(·, T ) solves the equation d L(t, T ) = δ −1 (1 + δL(t, T )) γ (t, T, T + δ) · dWtT +δ ,

(2.14)

subject to the initial condition (2.13). Suppose that forward Libor rates L(t, T ) are strictly positive. Then formula (2.14) can be rewritten as follows: d L(t, T ) = L(t, T ) λ(t, T ) · dWtT +δ ,

(2.15)

where for any t ∈ [0, T ] λ(t, T ) =

1 + δL(t, T ) γ (t, T, T + δ). δL(t, T )

(2.16)

This shows that the collection of forward processes uniquely specifies the family of forward Libor rates. The construction of a model of forward Libor rates relies on the following assumptions. (LR.1) For any maturity T ≤ T ∗ − δ, we are given a Rd -valued, bounded deterministic function5 λ(·, T ), which represents the volatility of the forward Libor rate process L(·, T ). (LR.2) We assume a strictly decreasing and strictly positive initial term structure B(0, T ), T ∈ [0, T ∗ ]. The associated initial term structure L(0, T ) of forward Libor rates satisfies, for every T ∈ [0, T ∗ −δ], L(0, T ) =

B(0, T ) − B(0, T + δ) . δ B(0, T + δ)

(2.17)

5 Volatility λ could well follow an adapted stochastic process; we deliberately focus here on a lognormal model

of forward Libor rates in which λ is deterministic.

346

M. Rutkowski

To construct a model satisfying (LR.1)–(LR.2), Brace et al. (1997) place themselves in the Heath–Jarrow–Morton setup and they assume that for every T ∈ [0, T ∗ ], the volatility b(t, T ) vanishes for every t ∈ [(T − δ) ∨ 0, T ]. In essence, the construction elaborated in Brace et al. (1997) is based on the forward induction, as opposed to the backward induction which we shall use in the next section. They start by postulating that the dynamics of L(t, T ) under the spot martingale measure P∗ are governed by the following SDE: d L(t, T ) = µ(t, T ) dt + L(t, T )λ(t, T ) · dWt∗ , where λ is a deterministic function, and the drift coefficient µ is unspecified. Recall that the arbitrage-free dynamics of the instantaneous forward rate f (t, T ) are d f (t, T ) = σ (t, T ) · σ ∗ (t, T ) dt + σ (t, T ) · dWt∗ , where σ ∗ (t, T ) = (cf. (2.12))

T t

σ (t, u) du = −b(t, T ). On the other hand, the relationship

T +δ

1 + δL(t, T ) = exp

f (t, u) du

(2.18)

T

is valid. Applying Itˆo’s formula to both sides of (2.18), and comparing the diffusion terms, we find that T +δ δL(t, T ) ∗ ∗ σ (t, T + δ) − σ (t, T ) = σ (t, u) du = λ(t, T ). 1 + δL(t, T ) T To solve the last equation for σ ∗ in terms of L, it is necessary to impose some sort of initial condition on σ ∗ . For instance, by setting σ (t, T ) = 0 for 0 ≤ t ≤ T ≤ t + δ, we obtain the following relationship: ∗

b(t, T ) = −σ (t, T ) = −

[δ −1 (T −t)] k=1

δL(t, T − kδ) λ(t, T − kδ). 1 + δL(t, T − kδ)

(2.19)

The existence and uniqueness of solutions to SDEs which govern the instantaneous forward rate f (t, T ) and the forward Libor rate L(t, T ) for σ ∗ given by (2.19) can be shown using forward induction. Taking this result for granted, we conclude that L(t, T ) satisfies, under the spot martingale measure P∗ , d L(t, T ) = L(t, T )σ ∗ (t, T ∗ + δ) · λ(t, T ) dt + L(t, T )λ(t, T ) · dWt∗ . In this way, Brace et al. (1997) are able to completely specify their model of forward Libor rates.

10. Modelling of Forward Libor and Swap Rates

347

2.2.3 Musiela–Rutkowski approach In this section, we describe an alternative approach to the modelling of forward Libor rates; the construction presented below is a slight modification of that given by Musiela and Rutkowski (1997b). Let us start by introducing some notation. We assume that we are given a prespecified collection of reset/settlement dates 0 < T0 < T1 < · · · < Tn = T ∗ , referred to as the tenor structure (by convention, T−1 = 0). Let us denote δ j = T j − T j−1 for j = 0, . . . , n. Then obviously T j = j i=0 δ i for every j = 0, . . . , n. We find it convenient to denote, for m = 0, . . . , n, Tm∗ = T ∗ −

n

δ j = Tn−m .

j=n−m+1

For any j = 0, . . . , n − 1, we define the forward Libor rate L(·, T j ) by setting L(t, T j ) =

B(t, T j ) − B(t, T j+1 ) , δ j+1 B(t, T j+1 )

∀ t ∈ [0, T j ].

Definition 2.2 For any j = 0, . . . , n, a probability measure PT j on (, FT j ), equivalent to P, is said to be the forward Libor measure for the date T j if, for every k = 0, . . . , n the relative bond price Un− j+1 (t, Tk ) :=

B(t, Tk ) , δ j B(t, T j )

∀ t ∈ [0, Tk ∧ T j ],

follows a local martingale under PT j . It is clear that the notion of forward Libor measure is in fact identical with that of a forward probability measure for a given date. Also, it is trivial to observe that the forward Libor rate L(·, T j ) necessarily follows a local martingale under the forward Libor measure for the date T j+1 . If, in addition, it is a strictly positive process, the existence of the associated volatility process can be justified by standard arguments. In our further development, we shall go the other way around; that is, we will assume that for any date T j , the volatility λ(·, T j ) of the forward Libor rate L(·, T j ) is exogenously given. In principle, it can be a deterministic Rd -valued function of time, a Rd -valued function of the underlying forward Libor rates, or it can follow a d-dimensional adapted stochastic process. For simplicity, we assume throughout that the volatilities of forward Libor rates are bounded processes (or functions). To be more specific, we make the following standing assumptions. Assumptions (LR) We are given a family of bounded adapted processes λ(·, T j ), j = 0, . . . , n − 1, which represent the volatilities of forward Libor rates L(·, T j ). In addition, we are given an initial term structure of interest rates, specified by a

348

M. Rutkowski

family B(0, T j ), j = 0, . . . , n, of bond prices. We assume here that B(0, T j ) > B(0, T j+1 ) for j = 0, . . . , n − 1. Our aim is to construct a family L(·, T j ), j = 0, . . . , n − 1 of forward Libor rates, a collection of mutually equivalent probability measures PT j , j = 1, . . . , n, and a family W T j , j = 1, . . . , n of processes in such a way that: (i) for any j = 1, . . . , n the process W T j follows a d-dimensional standard Brownian motion under the probability measure PT j , (ii) for any j = 0, . . . , n − 1, the forward Libor rate L(·, T j ) satisfies the SDE T j+1

d L(t, T j ) = L(t, T j ) λ(t, T j ) · dWt

,

∀ t ∈ [0, T j ],

(2.20)

with the initial condition L(0, T j ) =

B(0, T j ) − B(0, T j+1 ) . δ j+1 B(0, T j+1 )

As already mentioned, the construction of the model is based on backward induction, therefore we start by defining the forward Libor rate with the longest maturity, i.e., Tn−1 . We postulate that L(·, Tn−1 ) = L(·, T1∗ ) is governed under the underlying probability measure P by the following SDE6 d L(t, T1∗ ) = L(t, T1∗ ) λ(t, T1∗ ) · dWt , with the initial condition L(0, T1∗ ) =

B(0, T1∗ ) − B(0, T ∗ ) . δ n B(0, T ∗ )

Put another way, we have L(t, T1∗ )

B(0, T1∗ ) − B(0, T ∗ ) = Et δ n B(0, T ∗ )

0

·

λ(u, T1∗ ) · dWu .

Since B(0, T1∗ ) > B(0, T ∗ ), it is clear that the L(·, T1∗ ) follows a strictly positive martingale under PT ∗ = P. The next step is to define the forward Libor rate for the date T2∗ . For this purpose, we need to introduce first the forward probability measure for the date T1∗ . By definition, it is a probability measure Q, which is equivalent to P, and such that processes U2 (t, Tk∗ ) =

B(t, Tk∗ ) δ n−1 B(t, T1∗ )

6 Notice that, for simplicity, we have chosen the underlying probability measure P to play the role of the forward Libor measure for the date T ∗ . This choice is not essential, however.

10. Modelling of Forward Libor and Swap Rates

349

are Q-local martingales. It is important to observe that the process U2 (·, Tk∗ ) admits the following representation: U2 (t, Tk∗ ) =

δ n−1 δ n U1 (t, Tk∗ ) . δ n L(t, T1∗ ) + 1

Let us formulate an auxiliary result, which is a straightforward consequence of Itˆo’s rule. Lemma 2.3 Let G and H be real-valued adapted processes, such that dG t = α t · dWt ,

d Ht = β t · dWt .

Assume, in addition, that Ht > −1 for every t and denote Yt = (1 + Ht )−1 . Then d(Yt G t ) = Yt α t − Yt G t β t · dWt − Yt β t dt . It follows immediately from Lemma 2.3 that δ n L(t, T1∗ ) ∗ k ∗ λ(t, T1 ) dt dU2 (t, Tk ) = ηt · dWt − 1 + δ n L(t, T1∗ ) for a certain process ηk . Therefore it is enough to find a probability measure under which the process t t δ n L(u, T1∗ ) T∗ ∗ λ(u, T Wt 1 := Wt − ) du = W − γ (u, T1∗ ) du, t 1 ∗ 1 + δ L(u, T ) n 0 0 1 t ∈ [0, T1∗ ], follows a standard Brownian motion (the definition of γ (·, T1∗ ) is clear from the context). This can be easily achieved using Girsanov’s theorem, as we may put · dPT1∗ ∗ = ET1 γ (u, T1∗ ) · dWu , P-a.s. dP 0 We are in a position to specify the dynamics of the forward Libor rate for the date T2∗ under PT1∗ , i.e. we postulate that T∗

d L(t, T2∗ ) = L(t, T2∗ ) λ(t, T2∗ ) · dWt 1 , with the initial condition L(0, T2∗ ) =

B(0, T2∗ ) − B(0, T1∗ ) . δ n−1 B(0, T1∗ )

Let us now assume that we have found processes L(·, T1∗ ), . . . , L(·, Tm∗ ). This ∗ and the associated means, in particular, that the forward Libor measure PTm−1

350

M. Rutkowski ∗

Brownian motion W Tm−1 are already specified. Our aim is to determine the forward Libor measure PTm∗ . It is easy to check that Um+1 (t, Tk∗ ) =

δ n−m−1 δ n−m Um (t, Tk∗ ) . δ n−m L(t, Tm∗ ) + 1

Using Lemma 2.3, we obtain the following relationship: t ∗ δ n−m L(u, Tm∗ ) Tm−1 Tm∗ − Wt = W t λ(u, Tm∗ ) du ∗ 0 1 + δ n−m L(u, Tm ) for t ∈ [0, Tm∗ ]. The forward Libor measure PTm∗ can thus be easily found using ∗ Girsanov’s theorem. Finally, we define the process L(·, Tm+1 ) as the solution to the SDE T∗

∗ ∗ ∗ ) = L(t, Tm+1 ) λ(t, Tm+1 ) · dWt m , d L(t, Tm+1

with the initial condition ∗ L(0, Tm+1 )=

∗ ) − B(0, Tm∗ ) B(0, Tm+1 . δ n−m B(0, Tm∗ )

Remarks If the volatility coefficient λ(·, Tm ) : [0, Tn ] → Rd is a deterministic function, then for each date t ∈ [0, Tm ] the random variable L(t, Tm ) has a lognormal probability law under the forward probability measure PTm+1 . Let us now examine the existence and uniqueness of the implied savings account,7 in a discrete-time setup. Intuitively, the value Bt∗ of a savings account at time t can be interpreted as the cash amount accumulated up to time t by rolling over a series of zero-coupon bonds with the shortest maturities available. To find the process B ∗ in a discrete-tenor framework, we do not have to specify explicitly all bond prices; the knowledge of forward bond prices is sufficient. Indeed, it is clear that FB (t, T j , T ∗ ) B(t, T j ) FB (t, T j , T j+1 ) = = . FB (t, T j+1 , T ∗ ) B(t, T j+1 ) This in turn yields, upon setting t = T j FB (T j , T j , T j+1 ) = 1/B(T j , T j+1 ),

(2.21)

so that the price B(T j , T j+1 ) of a single-period bond is uniquely specified for every j. Though the bond that matures at time T j does not physically exist after this date, it seems justifiable to consider FB (T j , T j , T j+1 ) as its forward value at time T j for the next future date T j+1 . In other words, the spot value at time T j+1 of one cash 7 The interested reader is referred to Musiela and Rutkowski (1997b) for the definition of an implied savings

account in a continuous-time setup. See also D¨oberlein and Schweizer (1998) and D¨oberlein et al. (2000) for further developments and the general uniqueness result.

10. Modelling of Forward Libor and Swap Rates

351

unit received at time T j equals B −1 (T j , T j+1 ). The discrete-time savings account B ∗ thus equals (recall that T−1 = 0) BT∗k =

k 0

k 0 −1 FB T j−1 , T j−1 , T j = B T j−1 , T j

j=0

j=0

for k = 0, . . . , n, since, by convention, we set B0∗ = 1. Note that FB T j−1 , T j−1 , T j = 1 + δL(T j−1 , T j ) > 1 for j = 0, . . . , n, and since BT∗ j = FB (T j−1 , T j−1 , T j ) BT∗ j−1 , we find that BT∗ j > BT∗ j−1 for every j = 0, . . . , n. We conclude that the implied savings account B ∗ follows a strictly increasing discrete-time process. Let us define the probability measure P∗ equivalent to P on (, FT ∗ ) by the formula8 dP∗ = BT∗ ∗ B(0, T ∗ ), P-a.s. (2.22) dP The probability measure P∗ appears to be a plausible candidate for a spot martingale measure. Indeed, if we set (2.23) B(Tl , Tk ) = E P∗ BT∗l (BT∗k )−1 FTl for every l ≤ k ≤ n, then in the case of l = k − 1, equality (2.23) coincides with (2.21). Let us observe that it is not possible to uniquely determine the continuous-time dynamics of a bond price B(t, T j ) within the framework of the discrete-tenor model of forward Libor rates (the specification of forward Libor rates for all maturities is necessary for this purpose). 2.2.4 Jamshidian’s approach The backward induction approach to modelling of forward Libor rates presented in the preceding section was re-examined and essentially generalized by Jamshidian (1997). In this section, we present briefly his approach to the modelling of forward Libor rates. As made apparent in the preceding section, in the direct modelling of Libor rates, no explicit reference is made to the bond price processes, which are used to formally define a forward Libor rate through equality (2.12). Nevertheless, to explain the idea that underpins Jamshidian’s approach, we shall temporarily assume that we are given a family of bond prices B(t, T j ) for the future dates T j , j = 1, . . . , n. By definition, the spot Libor measure is that probability measure equivalent to P, under which all relative bond prices are local martingales, when the 8 Recall that P plays the role of the forward Libor measure for the date T ∗ . Therefore, formula (2.22) is a

consequence of the standard definition of a forward measure.

352

M. Rutkowski

price process obtained by rolling over single-period bonds is taken as a numeraire. The existence of such a measure can be either postulated or derived from other conditions.9 Let us put, for t ∈ [0, T ∗ ] (as before T−1 = 0) G t = B(t, Tm(t) )

m(t) 0

B −1 (T j−1 , T j ),

(2.24)

j=0

where m(t) = inf k = 0, 1, . . . |

k

δi ≥ t = inf {k = 0, 1, . . . | Tk ≥ t}.

i=0

It is easily seen that G t represents the wealth at time t of a portfolio which starts at time 0 with one unit of cash invested in a zero-coupon bond of maturity T0 , and whose wealth is then reinvested at each date T j , j = 0, . . . , n − 1, in zero-coupon bonds which mature at the next date; that is, T j+1 . Definition 2.4 A spot Libor measure, denoted by PL , is a probability measure on (, FT ∗ ) which is equivalent to P, and such that for any j = 0, . . . , n the relative bond price B(t, T j )/G t follows a local martingale under P L . Note that B(t, Tk+1 )/G t =

m(t) 0 j=0

−1 1 + δ j L(T j−1 , T j−1 )

k 0

1 + δ j L(t, T j−1 ) ,

j=m(t)+1

so that all relative bond prices B(t, T j )/G t , j = 0, . . . , n are uniquely determined by a collection of forward Libor rates. In this sense, G is the correct choice of the reference price process in the present setting. We shall now concentrate on the derivation of the dynamics under P L of forward Libor rates L(·, T j ), j = 0, . . . , n − 1. Our aim is to show that these dynamics involve only the volatilities of forward Libor rates (as opposed to volatilities of bond prices or other processes). Therefore, it is possible to define the whole family of forward Libor rates simultaneously under one probability measure (of course, this feature can also be deduced from the preceding construction). To facilitate the derivation of the dynamics of L(·, T j ), we postulate temporarily that bond prices B(t, T j ) follow Itˆo processes under the underlying probability measure P, more explicitly (2.25) d B(t, T j ) = B(t, T j ) a(t, T j ) dt + b(t, T j ) · dWt 9 One may assume, e.g., that bond prices B(t, T ) satisfy the weak no-arbitrage condition, meaning that there j ˜ equivalent to P, and such that all processes B(t, Tk )/B(t, T ∗ ) are P-local ˜ exists a probability measure P,

martingales.

10. Modelling of Forward Libor and Swap Rates

353

for every j = 0, . . . , n, where, as before, W is a d-dimensional standard Brownian motion under an underlying probability measure P (it should be stressed, however, that we do not assume here that P is a forward (or spot) martingale measure). Combining (2.24) with (2.25), we obtain (2.26) dG t = G t a(t, Tm(t) ) dt + b(t, Tm(t) ) · dWt . Furthermore, by applying Itˆo’s rule to the equality 1 + δ j+1 L(t, T j ) =

B(t, T j ) , B(t, T j+1 )

(2.27)

we find that d L(t, T j ) = µ(t, T j ) dt + ζ (t, T j ) · dWt , where µ(t, T j ) =

B(t, T j ) a(t, T j ) − a(t, T j+1 ) − ζ (t, T j )b(t, T j+1 ) δ j+1 B(t, T j+1 )

and ζ (t, T j ) =

B(t, T j ) b(t, T j ) − b(t, T j+1 ) . δ j+1 B(t, T j+1 )

(2.28)

Using (2.27) and the last formula, we arrive at the following relationship: b(t, Tm(t) ) − b(t, T j+1 ) =

j

δ k+1 ζ (t, Tk ) . 1 + δ k+1 L(t, Tk ) k=m(t)

(2.29)

By definition of a spot Libor measure P L , each relative price B(t, T j )/G t follows a local martingale under P L . Since, in addition, P L is assumed to be equivalent to P, it is clear that it is given by the Dol´eans exponential, that is · dP L h u · dWu , P-a.s. = ET ∗ dP 0 for some adapted process h. It it not hard to check, using Itˆo’s rule, that h necessarily satisfies, for t ∈ [0, T j ], a(t, T j ) − a(t, Tm(t) ) = b(t, Tm(t) ) − h t · b(t, T j ) − b(t, Tm(t) ) for every j = 0, . . . , n. Combining (2.28) with the last formula, we obtain B(t, T j ) a(t, T j ) − a(t, T j+1 ) = ζ (t, T j ) · b(t, Tm(t) ) − h t , δ j+1 B(t, T j+1 ) and this in turn yields d L(t, T j ) = ζ (t, T j ) ·

b(t, Tm(t) ) − b(t, T j+1 ) − h t dt + dWt .

354

M. Rutkowski

Using (2.29), we conclude that process L(·, T j ) satisfies d L(t, T j ) =

j δ k+1 ζ (t, Tk ) · ζ (t, T j ) dt + ζ (t, T j ) · dWtL , 1 + δ L(t, T ) k+1 k k=m(t)

t where the process WtL = Wt − 0 h u du follows a d-dimensional standard Brownian motion under the spot Libor measure P L . To further specify the model, we assume that processes ζ (t, T j ), j = 0, . . . , n − 1, have the following form, for t ∈ [0, T j ], ζ (t, T j ) = λ j t, L(t, T j ), L(t, T j+1 ), . . . , L(t, Tn ) , where λ j : [0, T j ] × Rn− j+1 → Rd are given functions. In this way, we obtain a system of SDEs j δ k+1 λk (t, L k (t)) · λ j (t, L j (t)) dt + λ j (t, L j (t)) · dWtL , d L(t, T j ) = 1 + δ L(t, T ) k+1 k k=m(t)

where we write L j (t) = (L(t, T j ), L(t, T j+1 ), . . . , L(t, Tn )). Under mild regularity assumptions, this system can be solved recursively, starting from L(·, Tn−1 ). The lognormal model of forward Libor rates corresponds to the choice of ζ (t, T j ) = λ(t, T j )L(t, T j ), where λ(·, T j ) : [0, T j ] → Rd is a deterministic function for every j.

2.3 Dynamics of Libor rates and bond prices We assume that the volatilities of processes L(·, T j ) follow deterministic functions. Put another way, we place ourselves within the framework of the lognormal model of forward Libor rates. It is interesting to note that in all approaches, there is a uniquely determined correspondence between forward measures (and forward Brownian motions) associated with different dates T0 , . . . , Tn . On the other hand, however, there is a considerable degree of ambiguity in the way in which the spot martingale measure is specified (in some instances, it is not introduced at all). Consequently, the futures Libor rate L f (·, T j ), which equals (cf. Section 2.1.3) ˜ j , T j ) | Ft ), L f (t, T j ) = E P∗ (L(T j , T j ) | Ft ) = E P∗ ( L(T

(2.30)

is not necessarily specified in the same way in various approaches to the lognormal model of forward Libor rates. For this reason, we start by examining the distributional properties of forward Libor rates, which are identical in all abovementioned models. For a given function g : R → R and a fixed date u ≤ T j , we are interested in the following payoff of the form X = g L(u, T j ) which settles at time T j . Particular

10. Modelling of Forward Libor and Swap Rates

355

cases of such payoffs are X 1 = g B −1 (T j , T j+1 ) , X 2 = g B(T j , T j+1 ) , X 3 = g FB (u, T j+1 , T j ) . Recall that ˜ j , T j ) = 1 + δ j+1 L f (T j , T j ). B −1 (T j , T j+1 ) = 1 + δ j+1 L(T j , T j ) = 1 + δ j+1 L(T The choice of the “pricing measure” is thus largely the matter of convenience. Similarly, we have B(T j , T j+1 ) =

1 = FB (T j , T j+1 , T j ). 1 + δ j+1 L(T j , T j )

(2.31)

More generally, the forward price of a T j+1 -maturity bond for the settlement date T j equals B(u, T j+1 ) 1 FB (u, T j+1 , T j ) = = . (2.32) B(u, T j ) 1 + δ j+1 L(u, T j ) ˜ B (u, T j+1 , T j )) Generally speaking, to value the claim X = g(L(u, T j )) = g(F which settles at time T j we may use the formula π t (X ) = B(t, T j )E PT j (X | Ft ),

∀ t ∈ [0, T j ].

It is thus clear that to value a claim in the case u ≤ T j , it is enough to know the dynamics of either L(·, T j ) or FB (·, T j+1 , T j ) under the forward probability measure PT j . If u = T j , we may equally well use the the dynamics, under PT j , of ˜ T j ) or L f (·, T j ). For instance, either L(·, π t (X 1 ) = B(t, T j )E PT j (B −1 (T j , T j+1 ) | Ft ) = B(t, T j )E PT j (FB−1 (T j , T j+1 , T j ) | Ft ), but also

π t (X 1 ) = B(t, T j ) 1 + δ j+1 E PT j (Z (T j ) | Ft ) ,

˜ j , T j ) = L f (T j , T j ). where Z (T j ) = L(T j , T j ) = L(T 2.3.1 Dynamics of L(·, T j ) under PT j We shall now derive the transition probability density function (p.d.f.) of the process L(·, T j ) under the forward probability measure PT j . Let us first prove the following related result, due to Jamshidian (1997). Proposition 2.5 Let t ≤ u ≤ T j . Then

E P T j L(u, T j ) | Ft = L(t, T j ) +

δ j+1 Var PT j+1 L(u, T j ) | Ft 1 + δ j+1 L(t, T j )

.

(2.33)

356

M. Rutkowski

In the case of the lognormal model of Libor rates, we have # 2 $ δ j+1 L(t, T j ) ev j (t,u) − 1 E P T j L(u, T j ) | Ft = L(t, T j ) 1 + , 1 + δ j+1 L(t, T j ) where

v 2j (t, u)

= Var PT j+1

u

λ(s, T j ) ·

T dWs j+1

u

=

t

|λ(s, T j )|2 ds.

(2.34)

(2.35)

t

˜ T j ) satisfies10 In particular, the modified Libor rate L(t, # 2 $ δ j+1 L(t, T j ) ev j (t,T j ) − 1 ˜ T j ) = E PT L(T j , T j ) | Ft = L(t, T j ) 1 + . L(t, j 1 + δ j+1 L(t, T j ) Proof Combining (2.5) with the martingale property of the process L(·, T j ) under PT j+1 , we obtain E PT j+1 (1 + δ j+1 L(u, T j ))L(u, T j ) | Ft E P T j L(u, T j ) | Ft = 1 + δ j+1 L(t, T j ) so that

E P T j L(u, T j ) | Ft = L(t, T j ) +

δ j+1 E P T j+1 (L(u, T j ) − L(t, T j ))2 | Ft 1 + δ j+1 L(t, T j )

.

In the case of the lognormal model, we have 1 2

L(u, T j ) = L(t, T j ) eη j (t,u)− 2 v j (t,u) , where

η j (t, u) =

u

T j+1

λ(s, T j ) dWs

.

(2.36)

t

Consequently,

2 E PT j+1 (L(u, T j ) − L(t, T j ))2 | Ft = L 2 (t, T j ) ev j (t,u) − 1 .

This gives the desired equality (2.34). The last asserted equality is a consequence of (2.6). To derive the transition probability density function (p.d.f.) of the process L(·, T j ), notice that for any t ≤ u ≤ T j , and any bounded Borel measurable function g : R → R we have g(L(u, T E )) 1 + δ L(u, T ) Ft P T j+1 j j+1 j . E P T j g(L(u, T j )) | Ft = 1 + δ j+1 L(t, T j ) 10 This equality can be referred to as the convexity correction.

10. Modelling of Forward Libor and Swap Rates

357

The following simple lemma appears to be useful. Lemma 2.6 Let ζ be a nonnegative random variable on a probability space (, F, P) with the probability density function f P . Let Q be a probability measure equivalent to P. Suppose that for any bounded Borel measurable function g : R → R we have E P (g(ζ )) = E Q (1 + ζ )g(ζ ) . Then the p.d.f. f Q of ζ under Q satisfies f P (y) = (1 + y) f Q (y). Proof The assertion is in fact trivial since, by assumption, ∞ ∞ g(y) f P (y) dy = g(y)(1 + y) f Q (y) dy −∞

−∞

for any bounded Borel measurable function g : R → R. Assume the lognormal model of Libor rates and fix x ∈ R. Recall that for any t ≥ u we have L(u, T j ) = L(t, T j ) e

η j (t,u)− 12 Var P T

j+1

(η j (t,u))

,

where η j (t, u) is given by (2.36) (so that it is independent of the σ -field Ft ). The Markov property of L(·, T j ) under the forward measure PT j+1 is thus apparent. Denote by p L (t, x; u, y) the transition p.d.f. under PT j+1 of the process L(·, T j ). Elementary calculations involving Gaussian densities yield p L (t, x; u, y) = PT j+1 {L(u, T j ) = y | L(t, T j ) = x} " 2 6 ln(y/x) + 12 v 2j (t, u) 1 exp − = √ 2v 2j (t, u) 2πv j (t, u)y for any x, y > 0 and t < u. Taking into account Lemma 2.6, we conclude that the transition p.d.f. of the process11 L(·, T j ), under the forward probability measure PT j , satisfies p˜ L (t, x; u, y) = PT j {L(u, T j ) = y | L(t, T j ) = x} =

1 + δ j+1 y p L (t, x; u, y). 1 + δ j+1 x

We are in a position to state the following result, which can be used, for instance, to value a contingent claim of the form X = h(L(T j )) which settles at time T j (see Schmidt (1996)). 11 The Markov property of L(·, T ) under P can be easily deduced from the Markovian features of the forward j Tj

price FB (·, T j , T j+1 ) under P T j (see formulae (2.37)–(2.38)).

358

M. Rutkowski

Corollary 2.7 The transition p.d.f. under PT j of the forward Libor rate L(·, T j ) equals, for any t < u and x, y > 0, " 2 6 ln(y/x) + 12 v 2j (t, u) 1 + δ j+1 y exp − . p˜ L (t, x; u, y) = √ 2v 2j (t, u) 2π v j (t, u) y(1 + δ j+1 x) 2.3.2 Dynamics of FB (·, T j+1 , T j ) under PT j Observe that the forward bond price FB (·, T j+1 , T j ) satisfies FB (t, T j+1 , T j ) =

B(t, T j+1 ) 1 = . B(t, T j ) 1 + δ j+1 L(t, T j )

(2.37)

First, this implies that in the lognormal model of Libor rates, the dynamics of the forward bond price FB (·, T j+1 , T j ) are governed by the following stochastic differential equation, under PT j , T d FB (t) = −FB (t) 1 − FB (t) λ(t, T j ) · dWt j ,

(2.38)

where we write FB (t) = FB (t, T j+1 , T j ). If the initial condition satisfies 0 < FB (0) < 1, this equation can be shown to admit a unique strong solution (it satisfies 0 < FB (t) < 1 for every t > 0). This makes clear that the process FB (·, T j+1 , T j ) – and thus also the process L(·, T j ) – are Markovian under PT j . Using Corollary 2.7 and relationship (2.37), one can find the transition p.d.f. of the Markov process FB (·, T j+1 , T j ) under PT j ; that is, p B (t, x; u, y) = PT j {FB (u, T j+1 , T j ) = y | FB (t, T j+1 , T j ) = x}. We have the following result (see Rady and Sandmann (1994), Miltersen et al. (1997), and Jamshidian (1997)). Corollary 2.8 The transition p.d.f. under PT j of the forward bond price FB (·, T j+1 , T j ) equals, for any t < u and arbitrary 0 < x, y < 1, 2 x(1−y) 1 2 ln y(1−x) + 2 v j (t, u) x p B (t, x; u, y) = √ exp − . 2v 2j (t, u) 2πv j (t, u)y 2 (1 − y) Proof Let us fix x ∈ (0, 1). Using (2.37), it is easy to show that 1−x 1−y −1 −2 (t, x; u, y) = δ y p ˜ pB ; u, , L t, δx δy where δ = δ j+1 . The formula now follows from Corollary 2.7.

10. Modelling of Forward Libor and Swap Rates

359

Let us observe that the results of this section can be applied to value the so-called irregular cash flows, such as caps or floors settled in advance (for more details on this issue we refer to Schmidt (1996)).

2.4 Caps and floors An interest rate cap (known also as a ceiling rate agreement) is a contractual arrangement where the grantor (seller) has an obligation to pay cash to the holder (buyer) if a particular interest rate exceeds a mutually agreed level at some future date or dates. Similarly, in an interest rate floor, the grantor has an obligation to pay cash to the holder if the interest rate is below a preassigned level. When cash is paid to the holder, the holder’s net position is equivalent to borrowing (or depositing) at a rate fixed at that agreed level. This assumes that the holder of a cap (or floor) agreement also holds an underlying asset (such as a deposit) or an underlying liability (such as a loan). Finally, the holder is not affected by the agreement if the interest rate is ultimately more favorable to him than the agreed level. This feature of a cap (or floor) agreement makes it similar to an option. Specifically, a forward start cap (or a forward start floor) is a strip of caplets (floorlets), each of which is a call (put) option on a forward rate, respectively. Let us denote by κ and by δ j the cap strike rate and the length of the accrual period, respectively. We shall check that an interest rate caplet (i.e., one leg of a cap) may also be seen as a put option with strike price 1 (per dollar of notional principal) which expires at the caplet start day on a discount bond with face value 1 + κδ j which matures at the caplet end date. Similarly to swap agreements, interest rate caps and floors may be settled either in arrears or in advance. In a forward cap or floor, which starts at time T0 , and is settled in arrears at dates T j , j = 1, . . . , n, the cash flows at times T j are N p (L(T j−1 ) − κ)+ δ j and N p (κ − L(T j−1 ))+ δ j , respectively, where N p stands for the notional principal (recall that δ j = T j − T j−1 ). As usual, the rate L(T j−1 ) = L(T j−1 , T j−1 ) is determined at the reset date T j−1 , and it satisfies B(T j−1 , T j )−1 = 1 + δ j L(T j−1 ).

(2.39)

The price at time t ≤ T0 of a forward cap, denoted by FCt , is (we set N p = 1) n Bt FCt = E P∗ (L(T j−1 ) − κ)+ δ j Ft B Tj j=1 n (2.40) = B(t, T j ) E PT j (L(T j−1 ) − κ)+ δ j Ft . j=1

On the other hand, since the cash flow of the j th caplet at time T j is manifestly an

360

M. Rutkowski

FT j−1 -measurable random variable, we may directly express the value of the cap in terms of expectations under forward measures PT j−1 , j = 1, . . . , n. Indeed, we have n FCt = (2.41) B(t, T j−1 ) E PT j−1 B(T j−1 , T j )(L(T j−1 ) − κ)+ δ j Ft . j=1

Consequently, using (2.39) we get the equality FCt =

n

B(t, T j−1 ) E PT j−1

+ 1 − δ˜ j B(T j−1 , T j ) Ft ,

(2.42)

j=1

which is valid for every t ∈ [0, T ]. It is apparent that a caplet is essentially equivalent to a put option on a zero-coupon bond; it may also be seen as an option on a single-period swap. The equivalence of a cap and a put option on a zero-coupon bond can be explained in an intuitive way. For this purpose, it is enough to examine two basic features of both contracts: the exercise set and the payoff value. Let us consider the j th caplet. A caplet is exercised at time T j−1 if and only if L(T j−1 ) − κ > 0, or, equivalently, if B(T j−1 , T j )−1 = 1 + L(T j−1 )(T j − T j−1 ) > 1 + κδ j = δ˜ j . The last inequality holds whenever δ˜ j B(T j−1 , T j ) < 1. This shows that both of the considered options are exercised in the same circumstances. If exercised, the caplet pays δ j (L(T j−1 ) − κ) at time T j , or equivalently −1 δ j B(T j−1 , T j )(L(T j−1 ) − κ) = 1 − δ˜ j B(T j−1 , T j ) = δ˜ j δ˜ j − B(T j−1 , T j ) at time T j−1 . This shows once again that the j th caplet, with strike level κ and nominal value 1, is essentially equivalent to a put option with strike price (1 + κδ j )−1 and nominal value δ˜ j = (1+κδ j ) written on the corresponding zero-coupon bond with maturity T j . The analysis of a floor contract can be done along similar lines. By definition, the j th floorlet pays (κ − L(T j−1 ))+ at time T j . Therefore, n Bt + E P∗ (κ − L(T j−1 )) δ j Ft , (2.43) FFt = BT j j=1 but also FFt =

n j=1

B(t, T j−1 ) E PT j−1

+ 1 − δ˜ j B(T j−1 , T j ) Ft .

(2.44)

10. Modelling of Forward Libor and Swap Rates

361

Combining (2.40) with (2.43) (or (2.42) with (2.44)), we obtain the following cap– floor parity relationship FCt − FFt =

n

B(t, T j−1 ) − δ˜ j B(t, T j ) ,

(2.45)

j=1

which is also an immediate consequence of the no-arbitrage property, so that it does not depend on the model’s choice. 2.4.1 Market valuation formula for caps and floors The main motivation for the introduction of a lognormal model of Libor rates was the market practice of pricing caps and swaptions by means of Black–Scholes-like formulae. For this reason, we shall first describe how market practitioners value caps. The formulae commonly used by practitioners assume that the underlying instrument follows a geometric Brownian motion under some probability measure, Q say. Since the formal definition of this probability measure is not available, we shall informally refer to Q as the market probability. Let us consider an interest rate cap with expiry date T and fixed strike level κ. Market practice is to price the option assuming that the underlying forward interest rate process is lognormally distributed with zero drift. Let us first consider a caplet – that is, one leg of a cap. Assume that the forward Libor rate L(t, T ), t ∈ [0, T ], for the accrual period of length δ follows a geometric Brownian motion under the “market probability”, Q say. More specifically, d L(t, T ) = L(t, T )σ dWt ,

(2.46)

where W follows a one-dimensional standard Brownian motion under Q, and σ is a strictly positive constant. The unique solution of (2.46) is L(t, T ) = L(0, T ) exp σ Wt − 12 σ 2 t 2 , ∀ t ∈ [0, T ], (2.47) where the initial condition is derived from the yield curve Y (0, T ), namely 1 + δL(0, T ) =

B(0, T ) = exp (T + δ)Y (0, T + δ) − T Y (0, T ) . B(0, T + δ)

The “market price” at time t of a caplet with expiry date T and strike level κ is calculated by means of the formula FC t = δ B(t, T + δ) E Q (L(T, T ) − κ)+ Ft . More explicitly, for any t ∈ [0, T ] we have FC t = δ B(t, T + δ) L(t, T )N eˆ1 (t, T ) − κ N eˆ2 (t, T ) ,

(2.48)

362

M. Rutkowski

where N is the standard Gaussian cumulative distribution function x 1 2 N (x) = √ e−z /2 dz, ∀ x ∈ R, 2π −∞ and eˆ1,2 (t, T ) =

ln(L(t, T )/κ) ± 12 vˆ02 (t, T ) vˆ 0 (t, T )

with vˆ 02 (t, T ) = σ 2 (T − t). This means that market practitioners price caplets using Black’s formula, with discount from the settlement date T + δ. A cap settled in arrears at times T j , j = 1, . . . , n, where T j − T j−1 = δ j , T0 = T , is priced by the formula n j j FCt = δ j B(t, T j ) L(t, T j−1 )N eˆ1 (t) − κ N eˆ2 (t) , (2.49) j=1

where for every j = 0, . . . , n − 1 j

eˆ1,2 (t) =

ln(L(t, T j−1 )/κ) ± 12 vˆ 2j (t)

(2.50)

vˆ j (t)

and vˆ 2j (t) = (T j−1 − t)σ 2j for some constants σ j , j = 1, . . . , n. Apparently, the market assumes that for any maturity T j , the corresponding forward Libor rate has a lognormal probability law under the “market probability”. The value of a floor can be easily derived by combining (2.49)–(2.50) with the cap–floor parity relationship (2.45). As we shall see in what follows, the valuation formulae obtained for caps and floors in the lognormal model of forward Libor rates agree with the market practice. 2.4.2 Valuation in the lognormal model of forward Libor rates We shall now examine the valuation of caps within the lognormal model of forward Libor rates of Section 2.2.3. The dynamics of the forward Libor rate L(t, T j−1 ) under the forward probability measure PT j are T

d L(t, T j−1 ) = L(t, T j−1 ) λ(t, T j−1 ) · dWt j ,

(2.51)

where W T j follows a d-dimensional Brownian motion under the forward measure PT j , and λ(·, T j−1 ) : [0, T j−1 ] → Rd is a deterministic function. Consequently, for every t ∈ [0, T j−1 ] we have · Tj λ(u, T j−1 ) · dWu . L(t, T j−1 ) = L(0, T j−1 )Et 0

In the present setup, the cap valuation formula (2.52) was first established by Miltersen et al. (1997), who focused on the dynamics of the forward Libor rate

10. Modelling of Forward Libor and Swap Rates

363

for a given date. Equality (2.52) was subsequently rederived through a probabilistic approach in Goldys (1997) and Rady (1997). Finally, the same result was established by means of the forward measure approach in Brace et al. (1997). The following proposition is a consequence of formula (2.41), combined with the dynamics (2.51). As before, N is the standard Gaussian probability distribution function. Proposition 2.9 Consider an interest rate cap with strike level κ, settled in arrears at times T j , j = 1, . . . , n. Assuming the lognormal model of Libor rates, the price of a cap at time t ∈ [0, T ] equals FCt =

n

δ j B(t, T j ) L(t, T j−1 )N

j e˜1 (t)

− κN

j e˜2 (t)

j=1

=

n

j

FC t ,

(2.52)

j=1

j

where FC t stands for the price at time t of the j th caplet for j = 1, . . . , n, j e˜1,2 (t)

=

ln(L(t, T j−1 )/κ) ± 12 v˜ 2j (t)

and

v˜ j (t)

T j−1

v˜ 2j (t) =

|λ(u, T j−1 )|2 du.

t

Proof We fix j and we consider the j th caplet. It is clear that its payoff at time T j admits the representation FC T j = δ j (L(T j−1 ) − κ)+ = δ j L(T j−1 ) 11 D − δ j κ 11 D , j

(2.53)

where D = {L(T j−1 ) > K } is the exercise set. Since the caplet settles at time T j , it is convenient to use the forward measure PT j to find its arbitrage price. We have j j FC t = B(t, T j )E PT j FC T j | Ft ), ∀ t ∈ [0, T j ]. Obviously, it is enough to find the value of a caplet for t ∈ [0, T j−1 ]. In view of (2.53), it is clear that we need to evaluate the following conditional expectations: j FC t = δ j B(t, T j ) E PT j L(T j−1 ) 11 D Ft − κδ j B(t, T j ) PT j (D-Ft ) = δ j B(t, T j )(I1 − I2 ), where the meaning of I1 and I2 is obvious from the context. Recall that L(T j−1 ) is given by the formula T j−1 1 T j−1 Tj 2 λ(u, T j−1 ) · d Wu − |λ(u, T j−1 )| du . L(T j−1 ) = L(t, T j−1 ) exp 2 t t

364

M. Rutkowski

Since λ(·, T j−1 ) is a deterministic function, the probability law under PT j of the Itˆo integral T j−1 T λ(u, T j−1 ) · dWu j ζ (t, T j−1 ) = t

is Gaussian, with zero mean and the variance T j−1 |λ(u, T j−1 )|2 du. Var PT j (ζ (t, T j−1 )) = t

Therefore, it is straightforward to show that12 $ # ln L(t, T j−1 ) − ln κ − 12 v 2j (t) . I2 = κ N v j (t) To evaluate I1 , we introduce an auxiliary probability measure Pˆ T j , equivalent to PT j on (, FT j−1 ), by setting d Pˆ T j = ET j−1 dPT j

·

λ(u, T j−1 ) ·

T dWu j

.

0

Then the process Wˆ T j given by the formula t Tj Tj ˆ λ(u, T j−1 ) du, Wt = Wt −

∀ t ∈ [0, T j−1 ],

0

follows the d-dimensional standard Brownian motion under Pˆ T j . Furthermore, the forward price L(T j−1 ) admits the representation under Pˆ T j , for t ∈ [0, T j−1 ], T j−1 1 T j−1 T L(T j−1 ) = L(t, T j−1 ) exp λ j−1 (u) · d Wˆ u j + |λ j−1 (u)|2 du 2 t t where we set λ j−1 (u) = λ(u, T j−1 ). Since T j−1 1 T j−1 T λ j−1 (u)·dWu j − |λ j−1 (u)|2 du Ft I1 = L(t, T j−1 )E PT j 11 D exp 2 t t from the abstract Bayes rule, we get I1 = L(t, T j−1 ) Pˆ T j (D | Ft ). Arguing in much the same way as for I2 , we thus obtain # $ ln L(t, T j−1 ) − ln κ + 12 v 2j (t) I1 = L(t, T j−1 ) N . v j (t) This completes the proof of the proposition. 12 See, for instance, the proof of the Black–Scholes formula in Musiela and Rutkowski (1997a).

10. Modelling of Forward Libor and Swap Rates

365

Once again, to derive the floors valuation formula, it is enough to make use of the cap–floor parity (2.45). 2.4.3 Hedging of caps and floors It is clear that the replicating strategy for a cap is a simple sum of replicating strategies for caplets. Therefore, it is enough to focus on a particular caplet. Let us denote by FC (t, T j ) the forward price of the j th caplet for the settlement date T j . From (2.52), it is clear that j j FC (t, T j ) = δ j L(t, T j−1 )N e˜1 (t) − κ N e˜2 (t) , so that an application of Itˆo’s formula yields13 j d FC (t, T j ) = δ j N e˜1 (t) d L(t, T j−1 ).

(2.54)

Let us consider the following self-financing trading strategy in the T j -forward mar14 ket. We start our trade at time 0 with F Cj(0, T j ) units of zero-coupon bonds. At j any time t ≤ T j−1 we assume ψ t = N e˜1 (t) positions in forward rate agreements (that is, single-period forward swaps) over the period [T j−1 , T j ]. The associated gains/losses process V , in the T j forward market,15 satisfies16 j j d Vt = δ j ψ t d L(t, T j−1 ) = δ j N e˜1 (t) d L(t, T j−1 ) = d FC (t, T ) with V0 = 0. Consequently,

T j−1

FC (T j−1 , T j ) = FC (0, T j ) +

j

δ j ψ t d L(t, T j−1 ) = FC (0, T j ) + VT j−1 .

0

It should be stressed that dynamic trading takes place on the interval [0, T j−1 ] only, the gains/losses (involving the initial investment) are incurred at time T j , however. All quantities in the last formula are expressed in units of T j -maturity zero-coupon bonds. Also, the caplet’s payoff is known already at time T j−1 , so that it is j completely specified by its forward price FC (T j−1 , T j ) = FC T j−1 /B(T j−1 , T j ). Therefore the last equality makes it clear that the strategy ψ introduced above does indeed replicate the j th caplet. It should be observed that formally the replicating strategy has also second comj ponent, ηt say, which represents the number of forward contracts on a T j -maturity bond, with the settlement date T j . Since obviously FB (t, T j , T j ) = 1 for every t ≤ T j , so that d FB (t, T j , T j ) = 0, for the T j -forward value of our strategy, we get 13 The calculations here are essentially the same as in the classic Black–Scholes model. 14 We need thus to invest FC j = F (0, T )B(0, T ) of cash at time 0. C j j 0 15 That is, with the value expressed in units of T -maturity zero-coupon bonds. j 16 To get a more intuitive insight in this formula, it is advisable to consider first a discretized version of ψ.

366

M. Rutkowski

j V˜t (ψ j , η j ) = ηt = FC (t, T j ) and

j j j d V˜t (ψ j , η j ) = ψ t δ j d L(t, T j−1 ) + ηt d FB (t, T j , T j ) = δ j N e˜1 (t) d L(t, T j−1 ).

It should be stressed, however, with the exception for the initial investment at time 0 in T j -maturity bonds, no bonds trading is required for the caplet’s replication. In practical terms, the hedging of a cap within the framework of the lognormal model of forward Libor rates in done exclusively through dynamic trading in the underlying single-period swaps. Of course, the same remarks (and similar calculations) apply also to floors. In this interpretation, the component η j simply represents the future (i.e., as of time T j−1 ) effects of a continuous trading in forward contracts. Alternatively, the hedging of a cap can be done in the spot (i.e., cash) market, using two simple portfolios of bonds. Indeed, it is easily seen that for the process Vt (ψ j , η j ) = B(t, T j−1 )V˜t (ψ j , η j ) = FC t

j

we have

j j Vt (ψ j , η j ) = ψ t B(t, T j−1 ) − B(t, T j ) + ηt d FB (t, T j , T j )

and j j d Vt (ψ j , η j ) = ψ t d B(t, T j−1 ) − B(t, T j ) + ηt d B(t, T j ) j j = N e˜1 (t) d B(t, T j−1 ) − B(t, T j ) + ηt d B(t, T j ). This means that the components ψ j and η j now represent the number of units of portfolios B(t, T j−1 ) − B(t, T j ) and B(t, T j ) held at time t. 2.4.4 Bond options We shall now give the bond option valuation formula within the framework of the lognormal model of forward Libor rates. This result was first obtained by Rady and Sandmann (1994), who adopted the PDE approach and who worked in a different setup (see also Goldys (1997), Miltersen et al. (1997), and Rady (1997)). In the present framework, it is an immediate consequence of (2.52) combined with (2.42). Proposition 2.10 The price Ct at time t ≤ T j−1 of a European call option, with expiration date T j−1 and strike price 0 < K < 1, written on a zero-coupon bond maturing at T j = T j−1 + δ j , equals j j Ct = (1 − K )B(t, T j )N l1 (t) − K (B(t, T j−1 ) − B(t, T j ))N l2 (t) , (2.55) where j l1,2 (t)

ln((1 − K )B(t, T j )) − ln K B(t, T j−1 ) − B(t, T j ) ± 12 v˜ j (t) = v˜ j (t)

10. Modelling of Forward Libor and Swap Rates

and

v˜ 2j (t)

=

T j−1

367

|λ(u, T j−1 )|2 du.

t

In view of (2.55), it is apparent that the replication of the bond option using the underlying bonds of maturity T j−1 and T j is rather involved. This should be contrasted with the case of the Gaussian Heath–Jarrow–Morton model17 in which hedging of bond options with the use of the underlying bonds is straightforward. This illustrates the general feature that each particular way of modelling the term structure is tailored to the specific class of derivatives and hedging instruments. 3 Modelling of forward swap rates We shall first describe the most typical swap contracts and related options (the so-called swaptions). Subsequently, we shall present a model of forward swap rates put forward by Jamshidian (1996, 1997). For the sake of expositional convenience, we shall follow the backward induction approach due to Rutkowski (1999), however. 3.1 Interest rate swaps Let us consider a forward (start) payer swap (that is, fixed-for-floating interest rate swap) settled in arrears, with notional principal N p . As before, we consider a finite collection of dates 0 < T0 < T1 < · · · < Tn so that δ j = T j − T j−1 > 0 for every j = 1, . . . , n. The floating rate L(T j−1 ) received at time T j is set at time T j−1 by reference to the price of a zero-coupon bond over the period [T j−1 , T j ]. More specifically, L(T j−1 ) is the spot Libor rate prevailing at time T j−1 , so that it satisfies B(T j−1 , T j )−1 = 1 + (T j − T j−1 )L(T j−1 ) = 1 + δ j L(T j−1 ).

(3.1)

Recall that in general, the forward Libor rate L(t, T j−1 ) for the future time period [T j−1 , T j ] of length δ j satisfies 1 + δ j L(t, T j−1 ) =

B(t, T j−1 ) = FB (t, T j−1 , T j ), B(t, T j )

(3.2)

so that L(T j−1 ) coincides with L(T j−1 , T j−1 ). At any date T j , j = 1, . . . , n, the cash flows of a forward payer swap are N p L(T j−1 )δ j and −N p κδ j , where κ is a preassigned fixed rate of interest (the cash flows of a forward receiver swap have the same size, but opposite signs). The number n, which coincides with the number of payments, is referred to as the length of a swap, (for instance, the length of a 17 In such a model the forward prices of bonds follow lognormal processes.

368

M. Rutkowski

three-year swap with quarterly settlement equals n = 12). The dates T0 , . . . , Tn−1 are known as reset dates, and the dates T1 , . . . , Tn as settlement dates. We shall refer to the first reset date T0 as the start date of a swap. Finally, the time interval [T j−1 , T j ] is referred to as the j th accrual period. We may and do assume, without loss of generality, that the notional principal N p = 1. The value at time t of a forward payer swap, which is denoted by FS t or FS t (κ), equals n Bt FS t (κ) = E P∗ (L(T j−1 ) − κ)δ j Ft . (3.3) BT j j=1 Since L(t, T j−1 ) =

B(t, T j−1 ) − B(t, T j ) , δ j B(t, T j )

it is clear that the process L(·, T j−1 ) follows a martingale under the forward martingale measure PT j . Therefore FS t (κ) =

n

B(t, T j )E PT j (L(T j−1 ) − κ)δ j Ft

j=1

=

n

B(t, T j ) (L(t, T j−1 ) − κ)δ j

j=1

=

n

B(t, T j−1 ) − B(t, T j ) − κδ j B(t, T j ) .

j=1

After rearranging, this yields FS t (κ) = B(t, T0 ) −

n

c j B(t, T j )

(3.4)

j=1

for every t ∈ [0, T ], where c j = κδ j for j = 1, . . . , n − 1, and cn = δ˜ n = 1 + κδ n . The last equality makes clear that a forward payer swap settled in arrears is, essentially, a contract to deliver a specific coupon-bearing bond and to receive at the same time a zero-coupon bond. Relationship (3.4) may also be established through a straightforward comparison of the future cash flows from these bonds. Note that (3.4) provides a simple method for the replication of a swap contract, independent of the term structure model. In the forward payer swap settled in advance – that is, in which each reset date is also a settlement date – the discounting method varies from country to country. In the U.S. and in many European markets, the cash flows of a swap settled in advance at reset dates T j , j = 0, . . . , n − 1, are L(T j )δ j+1 (1 + L(T j )δ j+1 )−1 and

10. Modelling of Forward Libor and Swap Rates

369

−κδ j+1 (1 + L(T j )δ j+1 )−1 . Therefore the value FS ∗∗ t (κ) at time t of this swap is

n−1 Bt δ j+1 (L(T j ) − κ) ∗∗ FS t (κ) = E P∗ Ft BT j 1 + δ j+1 L(T j ) j=0 n−1 Bt = E P∗ (L(T j ) − κ)δ j+1 B(T j , T j+1 ) Ft B Tj j=0 n−1 Bt = E P∗ (L(T j ) − κ)δ j+1 Ft , BT j+1 j=0 which coincides with the value of the swap settled in arrears. Once again, this is by no means surprising, since the payoffs L(T j )δ j+1 (1 + L(T j )δ j+1 )−1 and −κδ j+1 (1 + L(T j )δ j+1 )−1 at time T j are easily seen to be equivalent to payoffs L(T j )δ j+1 and −κδ j+1 respectively at time T j+1 (recall that 1 + L(T j )δ j+1 = B −1 (T j , T j+1 )). In what follows, we shall restrict our attention to interest rate swaps settled in arrears. As mentioned, a swap agreement is worthless at initiation. This important feature of a swap leads to the following definition, which refers in fact to the more general concept of a forward swap. Basically, a forward swap rate is that fixed rate of interest which makes a forward swap worthless. Definition 3.1 The forward swap rate κ(t, T0 , n) at time t for the date T0 is that value of the fixed rate κ which makes the value of the forward swap zero, i.e., that value of κ for which FS t (κ) = 0. Using (3.4), we obtain −1 n κ(t, T0 , n) = (B(t, T0 ) − B(t, Tn )) δ j B(t, T j ) . (3.5) j=1

A swap (swap rate, respectively) is the forward swap (forward swap rate, respectively) with t = T . The swap rate, κ(T0 , T0 , n), equals −1 n κ(T0 , T0 , n) = (1 − B(T0 , Tn )) δ j B(T0 , T j ) . (3.6) j=1

Note that the definition of a forward swap rate implicitly refers to a swap contract of length n which starts at time T0 . It would thus be more correct to refer to κ(t, T0 , n) as the n-period forward swap rate prevailing at time t, for the future date T0 . A forward swap rate is a rather theoretical concept, as opposed to swap rates, which are quoted daily (subject to an appropriate bid–ask spread) by financial institutions who offer interest rate swap contracts to their institutional clients. In practice, swap agreements of various lengths are offered. Also, typically, the length of the reference period varies over time; for instance, a five-year swap may be

370

M. Rutkowski

settled quarterly during the first three years, and semi-annually during the last two. Swap rates also play an important role as a basis for several derivative instruments. For instance, an appropriate swap rate is commonly used as a strike level for an option written on the value of a swap; that is, a swaption. Finally, it will be useful to express that value at time t of a given forward swap with fixed rate κ in terms of the current value of the forward swap rate. Since obviously FS t (κ(t, T0 , n)) = 0, using (3.4), we get FS t (κ) = FS t (κ) − FS t (κ(t, T0 , n)) =

n

(κ(t, T0 , n) − κ)B(t, T j ).

(3.7)

j=1

3.2 The lognormal model of forward swap rates The lognormal model of forward swap rates was developed by Jamshidian (1996, 1997). In this section, we follow Rutkowski (1999). We assume, as before, that the tenor structure 0 < T0 < T1 < · · · < Tn = T ∗ is given. Recall that δ j = T j − T j−1 j for j = 1, . . . , n, and thus T j = i=0 δi for every j = 0, . . . , n. For any fixed j, we consider a fixed-for-floating forward (payer) swap which starts at time T j and has n − j accrual periods, whose consecutive lengths are δ j+1 , . . . , δ n . The fixed interest rate paid at each of the reset dates Tl for l = j + 1, . . . , n equals κ, and the corresponding floating rate, L(Tl ), is found using the formula B(Tl , Tl+1 )−1 = 1 + (Tl+1 − Tl )L(Tl ) = 1 + δl+1 L(Tl ), i.e., it coincides with the Libor rate L(Tl , Tl ). It is not difficult to check, using no-arbitrage arguments, that the value of such a swap equals, for t ∈ [0, T j ] (by convention, the notional principal equals 1) FS t (κ) = B(t, T j ) −

n

cl B(t, Tl ),

l= j+1

where cl = κδl for l = j + 1, . . . , n − 1, and cn = 1 + κδ n . Consequently, the associated forward swap rate, κ(t, T j , n − j), that is, that value of a fixed rate κ for which such a swap is worthless at time t, is given by the formula κ(t, T j , n − j) =

B(t, T j ) − B(t, Tn ) δ j+1 B(t, T j+1 ) + · · · + δ n B(t, Tn )

(3.8)

for every t ∈ [0, T j ], j = 0, . . . , n − 1. In this section, we consider the family of forward swap rates κ(t, ˜ T j ) = κ(t, T j , n − j) for j = 0, . . . , n − 1. Let us stress that the underlying swap agreements differ in length, however, they all have a common expiration date, T ∗ = Tn . Suppose momentarily that we are given a family of bond prices B(t, Tm ), m = 1, . . . , n, on a filtered probability space (, F, P) equipped with a Brownian

10. Modelling of Forward Libor and Swap Rates

371

motion W . As in Section 2.1, we find it convenient to postulate that P = PT ∗ is the ∗ forward measure for the date T ∗ , and the process W = W T is the corresponding Brownian motion. For any m = 1, . . . , n − 1, we introduce the fixed-maturity coupon process G(m) by setting (recall that Tl∗ = Tn−l , in particular, T0∗ = Tn ) G t (m) =

n l=n−m+1

δl B(t, Tl ) =

m−1

δ n−k B(t, Tk∗ )

(3.9)

k=0

for t ∈ [0, Tn−m+1 ].A forward swap measure is that probability measure, equivalent to P, which corresponds to the choice of the fixed-maturity coupon process as a numeraire asset. We have the following definition. Definition 3.2 For j = 0, . . . , n, a probability measure P˜ T j on (, FT j ), equivalent to P, is said to be the fixed-maturity forward swap measure for the date T j if, for every k = 0, . . . , n, the relative bond price Z n− j+1 (t, Tk ) :=

B(t, Tk ) B(t, Tk ) = , G t (n − j + 1) δ j B(t, T j ) + · · · + δ n B(t, Tn )

t ∈ [0, Tk ∧ T j ], follows a local martingale under P˜ T j . Put another way, for any fixed m = 1, . . . , n + 1, the relative bond prices Z m (t, Tk∗ ) =

B(t, Tk∗ ) B(t, Tk∗ ) = , ∗ G t (m) δ n−m+1 B(t, Tm−1 ) + · · · + δ n B(t, T ∗ )

∗ t ∈ [0, Tk∗ ∧ Tm−1 ], are bound to follow local martingales under the forward swap ˜ ∗ . It follows immediately from (3.8) that the forward swap rate for measure PTm−1 the date Tm∗ equals, for t ∈ [0, Tm∗ ],

κ(t, ˜ Tm∗ ) =

B(t, Tm∗ ) − B(t, T ∗ ) , ∗ δ n−m+1 B(t, Tm−1 ) + · · · + δ n B(t, T ∗ )

or, equivalently, κ(t, ˜ Tm∗ ) = Z m (t, Tm∗ ) − Z m (t, T ∗ ). Therefore κ(·, ˜ Tm∗ ) also follows a local martingale under the forward swap mea∗ . Moreover, since obviously G t (1) = δ n B(t, T ∗ ), it is evident that sure P˜ Tm−1 ∗ ∗ ˜ Z 1 (t, Tk∗ ) = δ −1 n FB (t, Tk , T ), and thus the probability measure PT ∗ can be chosen to coincide with the forward martingale measure PT ∗ . Our aim is to construct a model of forward swap rates through backward induction. As one might expect, the underlying bond price processes will not be explicitly specified. We make the following standing assumptions.

372

M. Rutkowski

Assumptions (SR) We assume that we are given a family of bounded adapted processes ν(·, T j ), j = 0, . . . , n − 1, which represent the volatilities of forward swap rates κ(·, ˜ T j ). In addition, we are given an initial term structure of interest rates, specified by a family B(0, T j ), j = 0, . . . , n, of bond prices. We assume that B(0, T j ) > B(0, T j+1 ) for j = 0, . . . , n − 1. We wish to construct a family of forward swap rates in such a way that ˜ T j )ν(t, T j ) · d W˜ t dτ κ(t, T j ) = κ(t,

T j+1

(3.10)

for any j = 0, . . . , n − 1, where each process W˜ T j+1 follows a standard Brownian motion under the corresponding forward swap measure P˜ T j+1 . The model should also be consistent with the initial term structure of interest rates, meaning that κ(0, ˜ Tj ) =

B(0, T j ) − B(0, T ∗ ) . δ j+1 B(0, T j+1 ) + · · · + δ n B(0, Tn )

(3.11)

We proceed by backward induction. The first step is to introduce the forward swap ˜ T1∗ ) solves the rate for the date T1∗ by postulating that the forward swap rate κ(·, SDE ∗

˜ T1∗ )ν(t, T1∗ ) · dτ WtT , dτ κ(t, T1∗ ) = κ(t,

∀ t ∈ [0, T1∗ ],

(3.12)

where W˜ T = W T = W , with the initial condition ∗

∗

κ(0, ˜ T1∗ ) =

B(0, T1∗ ) − B(0, T ∗ ) . δ n B(0, T ∗ )

To specify the process κ(·, ˜ T2∗ ), we need first to introduce a forward swap measure ˜PT ∗ and an associated Brownian motion W˜ T1∗ . To this end, notice that each process 1 Z 1 (·, Tk∗ ) = B(·, Tk∗ )/δ n B(·, T ∗ ), follows a strictly positive local martingale under P˜ T ∗ = PT ∗ . More specifically, we have d Z 1 (t, Tk∗ ) = Z 1 (t, Tk∗ )γ 1 (t, Tk∗ ) · dτ WtT

∗

(3.13)

for some adapted process γ 1 (·, Tk∗ ). According to the definition of a fixed-maturity forward swap measure, we postulate that for every k the process Z 2 (t, Tk∗ ) =

Z 1 (t, Tk∗ ) B(t, Tk∗ ) = δ n−1 B(t, T1∗ ) + δ n B(t, T ∗ ) 1 + δ n−1 Z 1 (t, T1∗ )

follows a local martingale under P˜ T1∗ . Applying Lemma 2.3 to processes G = Z 1 (·, Tk∗ ) and H = δ n−1 Z 1 (·, T1∗ ), it is easy to see that for this property to hold, it ∗ suffices to assume that the process W˜ T1 , which is given by the formula t δ n−1 Z 1 (u, T1∗ ) T1∗ T∗ ∗ ˜ ˜ Wt = Wt − ∗ γ 1 (u, T1 ) du, 0 1 + δ n−1 Z 1 (u, T1 )

10. Modelling of Forward Libor and Swap Rates

373

t ∈ [0, T1∗ ], follows a Brownian motion under P˜ T1∗ , (the probability measure P˜ T1∗ is yet unspecified, but will be soon found through Girsanov’s theorem). Note that Z 1 (t, T1∗ ) =

B(t, T1∗ ) ˜ T1∗ ) + δ −1 = κ(t, ˜ T1∗ ) + Z 1 (t, T ∗ ) = κ(t, n . δ n B(t, T ∗ )

Differentiating both sides of the last equality, we get (cf. (3.12) and (3.13)) Z 1 (t, T1∗ )γ 1 (t, T1∗ ) = κ(t, ˜ T1∗ )ν(t, T1∗ ). ∗ Consequently, W˜ T1 is explicitly given by the formula t δ n−1 κ(u, ˜ T1∗ ) T1∗ T∗ ˜ ˜ ν(u, T1∗ ) du Wt = Wt − −1 ˜ T1∗ ) 0 1 + δ n−1 δ n + δ n−1 κ(u,

for t ∈ [0, T1∗ ]. We are in a position to define, using Girsanov’s theorem, the associated forward swap measure P˜ T1∗ . Subsequently, we introduce the process κ(·, ˜ T2∗ ), by postulating that it solves the SDE ∗

T dτ κ(t, T2∗ ) = κ(t, ˜ T2∗ )ν(t, T2∗ ) · d W˜ t 1

with the initial condition B(0, T2∗ ) − B(0, T ∗ ) . δ n−1 B(0, T1∗ ) + δ n B(0, T ∗ )

κ(0, ˜ T2∗ ) =

For the reader’s convenience, let us consider one more inductive step, in which we are looking for κ(t, ˜ T3∗ ). We now consider processes Z 3 (t, Tk∗ ) =

B(t, Tk∗ ) Z 2 (t, Tk∗ ) = , δ n−2 B(t, T2∗ ) + δ n−1 B(t, T1∗ ) + δ n B(t, T ∗ ) 1 + δ n−2 Z 2 (t, T2∗ )

so that ∗

∗

T T W˜ t 2 = W˜ t 1 −

t 0

δ n−2 Z 2 (u, T2∗ ) γ (u, T2∗ ) du 1 + δ n−2 Z 2 (u, T2∗ ) 2

for t ∈ [0, T2∗ ]. It is useful to note that Z 2 (t, T2∗ ) =

B(t, T2∗ ) = κ(t, ˜ T2∗ ) + Z 2 (t, T ∗ ), δ n−1 B(t, T1∗ ) + δ n B(t, T ∗ )

where in turn Z 2 (t, T ∗ ) =

Z 1 (t, T ∗ ) 1 + δ n−1 Z 1 (t, T ∗ ) + δ n−1 κ(t, ˜ T1∗ )

and the process Z 1 (·, T ∗ ) is already known from the previous step (clearly, Z 1 (·, T ∗ ) = 1/dn ). Differentiating the last equality, we may thus find the volatility of the process Z 2 (·, T ∗ ), and consequently, define P˜ T2∗ .

374

M. Rutkowski

We now examine the general case. We proceed by induction with respect to m. ˜ Tm∗ ), the forward Suppose that we have found forward swap rates κ(·, ˜ T1∗ ), . . . , κ(·, ∗ ∗ swap measure P˜ Tm−1 and the associated Brownian motion W˜ Tm−1 . Our aim is to ∗ determine the forward swap measure P˜ Tm∗ , the associated Brownian motion W˜ Tm , ∗ ). To this end, we postulate that processes and the forward swap rate κ(·, ˜ Tm+1 Z m+1 (t, Tk∗ ) = =

B(t, Tk∗ ) B(t, Tk∗ ) = G t (m + 1) δ n−m B(t, Tm∗ ) + · · · + δ n B(t, T ∗ ) ∗ Z m (t, Tk ) 1 + δ n−m Z m (t, Tm∗ )

follow local martingales under P˜ Tm∗ . In view of Lemma 2.3, applied to processes G = Z m (·, Tk∗ ) and H = Z m (·, Tm∗ ), it is clear that we may set t ∗ δ n−m Z m (u, Tm∗ ) Tmδ T∗ ˜ ˜ (3.14) Wt = Wt − γ (u, Tm∗ ) du, ∗) m 1 + δ Z (u, T n−m m 0 m for t ∈ [0, Tm∗ ]. Therefore it is sufficient to analyse the process Z m (t, Tm∗ ) =

B(t, Tm∗ ) = κ(t, ˜ Tm∗ ) + Z m (t, T ∗ ). ∗ δ n−m+1 B(t, Tm−1 ) + · · · + δ n B(t, T ∗ )

To conclude, it is enough to notice that Z m (t, T ∗ ) =

Z m−1 (t, T ∗ ) . ∗ 1 + δ n−m+1 Z m−1 (t, T ∗ ) + δ n−m+1 κ(t, ˜ Tm−1 )

Indeed, from the preceding step, we know that the process Z m−1 (·, T ∗ ) is a (ra∗ ˜ Tm−1 ). Consequently, the tional) function of forward swap rates κ(·, ˜ T1∗ ), . . . , κ(·, process under the integral sign on the right-hand side of (3.14) can be expressed ∗ ˜ Tm−1 ) and their volatilities (since the explicit forusing the terms κ(·, ˜ T1∗ ), . . . , κ(·, ∗ mula is rather lengthy, it is not reported here). Having found the process W˜ Tm and ∗ ˜ Tm+1 ) through probability measure P˜ Tm∗ , we introduce the forward swap rate κ(·, (3.10)–(3.11), and so forth. If all volatilities are deterministic, the model is termed the lognormal model of fixed-maturity forward swap rates. 3.3 Valuation of swaptions For a long time, Black’s swaptions formula was merely a (widely used) practical tool to value swaptions. Indeed, the use of this formula was not supported by the existence of a reliable term structure model. Valuation and hedging of swaptions based on the suitable version of Black’s formula was analysed, for instance, in Neuberger (1990). The formal derivation of this heuristic results within the framework of a well established term structure model was first achieved in Jamshidian (1997).

10. Modelling of Forward Libor and Swap Rates

375

3.3.1 Payer and receiver swaptions The owner of a payer (receiver, respectively) swaption with strike rate κ, maturing at time T = T0 , has the right to enter at time T the underlying forward payer (receiver, respectively) swap settled in arrears.18 Because FS T (κ) is the value at time T of the payer swap with the fixed interest rate κ, it is clear that the price of the payer swaption at time t equals

+ Bt PS t = E P∗ FS T (κ) Ft . BT Using (3.3), we obtain n + BT Bt PS t = E P∗ (L(T j−1 ) − κ)δ j FT E P∗ Ft . BT BT j j=1 On the other hand, in view of (3.7) we also have + n Bt BT (κ(T, T, n) − κ)δ j FT E P∗ PS t = E P∗ Ft BT BT j j=1

(3.15)

(3.16)

The last equality yields n + Bt BT E P∗ PS t = E P∗ (κ(T, T, n) − κ)δ j FT Ft BT B Tj j=1 n Bt BT = E P∗ E P∗ (κ(T, T, n) − κ)+ δ j FT Ft BT B Tj j=1

n Bt = E P∗ δ j B(T, T j )E PT j (κ(T, T, n) − κ)+ FT Ft BT j=1 n Bt = E P∗ δ j B(T, T j )(κ(T, T, n) − κ)+ Ft BT j=1 + n Bt = E P∗ c j B(T, T j ) Ft . 1− BT j=1 Similarly, for the receiver swaption, we have

+ Bt −FS T (κ) Ft , RS t = E P∗ BT 18 By convention, the notional principal of the underlying swap (and thus also the notional principal of the swaption) equals N p = 1.

376

M. Rutkowski

that is RS t = E P∗

n + Bt BT E P∗ (κ − L(T j−1 ))δ j FT Ft , BT BT j j=1

(3.17)

where we write RS t to denote the price at time t of a receiver swaption. Consequently, reasoning in much the same way as in the case of a payer swaption, we get n + Bt BT RS t = E P∗ (κ − κ(T, T, n))δ j FT E P∗ Ft BT BT j j=1 n Bt BT + = E P∗ E P∗ (κ − κ(T, T, n)) δ j FT Ft BT BT j j=1 + n Bt = E P∗ c j B(T, T j ) − 1 Ft . BT j=1 We shall first focus on a payer swaption. In view of (3.15), it is apparent that a payer swaption is exercised at time T if and only if the value of the underlying swap is positive at this date. It should be made clear that a swaption may be exercised by its owner only at its maturity date T . If exercised, a swaption gives rise to a sequence of cash flows at prescribed future dates. By considering the future cash flows from a swaption and from the corresponding market swap19 available at time T , it is easily seen that the owner of a swaption is protected against the adverse movements of the swap rate that may occur before time T . Suppose, for instance, that the swap rate at time T is greater than κ. Then by combining the swaption with a market swap, the owner of a swaption with exercise rate κ is entitled to enter at time T , at no additional cost, a swap contract in which the fixed rate is κ. If, on the contrary, the swap rate at time T is less than κ, the swaption is worthless, but its owner is, of course, able to enter a market swap contract based on the current swap rate κ(T, T, n) ≤ κ. Concluding, the fixed rate paid by the owner of a swaption who intends to initiate a swap contract at time T will never be above the preassigned level κ. Notice that we that we have shown, in particular, that n BT Bt + ∗ ∗ PS t = E P EP (κ(T, T, n) − κ) δ j FT Ft . (3.18) BT B Tj j=1 This shows that a payer swaption is essentially equivalent to a sequence of fixed p payments d j = δ j (κ(T, T, n) − κ)+ which are received at settlement dates 19 At any time t, a market swap is that swap whose current value equals zero. Put more explicitly, it is the swap

in which the fixed rate κ equals the current swap rate.

10. Modelling of Forward Libor and Swap Rates

377

T1 , . . . , Tn , but whose value is known already at the expiry date T . In words, a payer swaption can be seen as a specific call option on a forward swap rate, with fixed strike level κ. The exercise date of the option is T , but the payoff takes place at each date T1 , . . . , Tn . This equivalence may also be derived by directly verifying that the future cash flows from the following portfolios established at time T are identical: portfolio A – a swaption and a market swap; and portfolio B – a just described call option on a swap rate and a market swap. Indeed, both portfolios correspond to a payer swap with the fixed rate equal to κ. Finally, the equality PS t = E P∗

+ n Bt c j B(T, T j ) Ft 1− BT j=1

(3.19)

shows that the payer swaption may also be seen as a standard put option on a coupon-bearing bond with the coupon rate κ, with exercise date T and strike price 1. Similar remarks are valid for the receiver swaption. In particular, a receiver swaption can also be viewed as a sequence of put options on a swap rate which are not allowed to be exercised separately. At time T the long party receives the value of a sequence of cash flows, discounted from time T j , j = 1, . . . , n, to the date T , defined by δ j (κ − κ(T, T, n))+ . On the other hand, a receiver swaption may be seen as a call option, with strike price 1 and expiry date T , written on a coupon bond with coupon rate equal to the strike rate κ of the underlying forward swap. Let us finally mention the put–call parity relationship for swaptions. It follows easily from (3.15)–(3.17) that PS t − RS t = FS t , i.e., payer swaption (t) − receiver swaption (t) = forward swap (t) provided that both swaptions expire at the same date T (and have the same contractual features).

3.3.2 Forward swaptions Let us now consider a forward swaption. In this case, we assume that the expiry date Tˆ of the swaption precedes the initiation date T of the underlying payer swap – that is, Tˆ ≤ T . Recall that FS t (κ) =

n κ(t, T, n) − κ B(t, T j ) j=1

378

M. Rutkowski

for t ∈ [0, T ]. It is thus clear that the payoff PS Tˆ at expiry Tˆ of the forward swaption (with strike 0) is either 0, if κ ≥ κ(Tˆ , T, n), or PS Tˆ

n κ(Tˆ , T, n) − κ B(Tˆ , T j ) = j=1

if, on the contrary, inequality κ(Tˆ , T, n) > κ holds. We conclude that the payoff PS Tˆ of the forward swaption can be represented in the following way: PS Tˆ =

n + κ(Tˆ , T, n) − κ B(Tˆ , T j ).

(3.20)

j=1

This means that, if exercised, the forward swaption gives rise to a sequence of equal payments κ(Tˆ , T, n) − κ at each settlement date T1 , . . . , Tn . By substituting Tˆ = T we recover, in a more intuitive way and in a more general setting, the previously observed dual nature of the swaption: it may be seen either as an option on the value of a particular (forward) swap or, equivalently, as an option on the corresponding (forward) swap rate. It is also clear that the owner of a forward swaption is able to enter at time Tˆ (at no additional cost) into a forward payer swap with preassigned fixed interest rate κ. 3.3.3 Valuation in the lognormal model of forward Libor rates Recall that within the general framework, the price at time t ∈ [0, T0 ] of a payer swaption20 with expiry date T = T0 and strike level κ equals n + Bt BT (L(T j−1 ) − κ)δ j FT E P∗ PS t = E P∗ Ft . BT B Tj j=1 Let D ∈ FT be the exercise set of a swaption; that is D = {ω ∈ | (κ(T, T, n) − κ)+ > 0} = {ω ∈ |

n

c j B(T, T j ) < 1}.

j=1

Lemma 3.3 The following equality holds for every t ∈ [0, T ]: n PS t = δ j B(t, T j ) E PT j (L(T, T j−1 ) − κ) I D Ft .

(3.21)

j=1

Proof Since PS t = E P∗

n BT Bt I D E P∗ (L(T j−1 ) − κ)δ j FT Ft , BT BT j j=1

20 Since the relationship PS − RS = FS is always valid, and the value of a forward swap is given by (3.4), t t t

it is enough to examine the case of a payer swaption.

10. Modelling of Forward Libor and Swap Rates

379

we have

PS t

n

Bt E = E (L(T j−1 ) − κ)δ j I D FT Ft BT j j=1 n = B(t, T j ) E PT j (L(T j−1 ) − κ)δ j I D Ft , P∗

P∗

j=1

where L(T j−1 ) = L(T j−1 , T j−1 ). For any j = 1, . . . , n, we have = E PT j E PT j L(T j−1 ) − κ FT I D Ft E P T j (L(T j−1 ) − κ) I D Ft = E PT j (L(T, T j−1 ) − κ) I D Ft , since Ft ⊂ FT and the process L(t, T j−1 ) is a PT j -martingale. For any k = 1, . . . , n, we define the random variable ζ k (t) by setting T ζ k (t) = λ(u, Tk−1 ) · dWuTk , ∀ t ∈ [0, T ],

(3.22)

t

and we write

T

λ2k (t) =

|λ(u, Tk−1 )|2 du,

∀ t ∈ [0, T ].

(3.23)

t

Note that for every k = 1, . . . , n and t ∈ [0, T ], we have L(T, Tk−1 ) = L(t, Tk−1 ) eζ k (t)−λk (t)/2 . 2

Recall also that the processes W Tk satisfy the following relationship: t δ k+1 L(u, Tk ) Tk+1 Tk = Wt + Wt λ(u, Tk ) du 0 1 + δ k+1 L(u, Tk ) for t ∈ [0, Tk ] and k = 0, . . . , n − 1. For ease of notation, we formulate the next result for t = 0 only; a general case can be treated along the same lines. For any fixed j, we denote by G j the joint probability distribution function of the n-dimensional random variable (ζ 1 (0), . . . , ζ n (0)) under the forward measure PT j . Proposition 3.4 Assume the lognormal model of Libor rates. The price at time 0 of a payer swaption with expiry date T = T0 and strike level κ equals n 2 L(0, T j−1 )e y j −λ j (0)/2 − κ I D˜ dG j (y1 , . . . , yn ), δ j B(0, T j ) PS 0 = j=1

Rn

380

M. Rutkowski

where I D˜ = I D˜ (y1 , . . . , yn ), and D˜ stands for the set

j n −1 0 n yk −λ2k (0)/2 ˜ cj 0 is a constant, and D = {VT1 > K VT2 } is the exercise set. It is easy to check using the abstract Bayes rule that the equality V02 VT1 dP1 = , dP2 V01 VT2

P2 -a.s.,

(3.27)

links the martingale measures P1 and P2 associated with the choice of value processes V 1 and V 2 as discount factors, respectively (both probability measures are considered here on (, FT )). Furthermore, the arbitrage price of the option admits the following representation Ct = Vt1 P1 (D | Ft ) − K Vt2 P2 (D | Ft ),

∀ t ∈ [0, T ],

(3.28)

where D = {VT1 > K VT2 }. To obtain the Black–Scholes-like formula for the option’s price Ct , it is enough to assume that the the relative price V 1 /V 2 follows a lognormal martingale under P2 , so that 1,2 d (Vt1 /Vt2 ) = (Vt1 /Vt2 )γ 1,2 t · dWt

(3.29)

for a deterministic function γ 1,2 : [0, T ] → Rd (for simplicity, we also assume that the function γ 1,2 is bounded). In view of (3.27), the Radon–Nikod´ym density of P1 with respect to P2 equals · dP1 1,2 1,2 = ET γ u · dWu , P2 -a.s., (3.30) dP2 0 and thus the process Wt2,1

=

Wt1,2

t

−

γ 1,2 u du,

∀ t ∈ [0, T ],

0

is a standard Brownian motion under P2 . Reasoning in the much the same way as in the proof of the classic Black–Scholes formula (see, for instance, the proof of Theorem 5.1.1 in Musiela and Rutkowski (1997a)), we obtain (3.31) Ct = Vt1 N d1 (t, T ) − K Vt2 N d2 (t, T ) , where d1,2 (t, T ) =

2 (t, T ) ln(Vt1 /Vt2 ) − ln K ± 12 v1,2

v1,2 (t, T )

10. Modelling of Forward Libor and Swap Rates

and

2 (t, T ) v1,2

T

=

2 |γ 1,2 u | du,

385

∀ t ∈ [0, T ].

t

Of course, the caps and swaptions22 valuation formulae in lognormal models described above can be seen as special cases of (3.31). The idea can be, of course, applied to other interest rate derivatives. It is worthwhile noting that in order to get the valuation result (3.31) for t = 0, it is enough to assume that the random variable VT1 /VT2 has a lognormal probability law under the martingale measure P2 . This simple observation underpins the construction of the so-called Markov-functional interest rate models – this alternative approach to term structure modelling is briefly reviewed in the next section. A more straightforward generalization of lognormal models of the term structure was developed by Andersen and Andreasen (1997). In this case, the assumption that the volatility is deterministic is replaced by a suitable functional form of the volatility. The resulting models are capable of handling the so-called volatility skew in observed option prices (empirical studies have shown that the implied volatilities of observed caps and swaptions prices tend to be decreasing functions of the strike level). The main focus in Andersen and Andreasen (1997) is on the use of the CEV process23 as a model of the forward Libor rate. Put more explicitly, they generalize equality (2.20) by postulating that T j+1

d L(t, T j ) = L α (t, T j ) λ(t, T j ) · dWt

,

∀ t ∈ [0, T j ],

where α > 0 is a strictly positive constant. They derive closed-form solutions for caplet prices under the above specification of the dynamics of Libor rates with α = 1, in terms of the cumulative distribution function of a non-central χ 2 probability law. It appears that, depending on the choice of the parameter α, the implied Black’s volatilities of caplet prices, considered as a function of the strike level κ > 0, exhibit downward- or upward-sloping skew. 4 Markov-functional models As shown in Section 2.2.4, the forward Libor or swap24 rates follow a multidimensional Markov process under any of the associated forward measures. In principle, lognormal models can be easily calibrated to market prices of caps (or 22 For the j th caplet, we take V 1 = B(t, T ) − B(t, T 2 th j j+1 ) and Vt = δ j+1 B(t, T j+1 ). In the case of the j t swaption, we have Vt1 = B(t, T j ) − B(t, Tn ) and Vt2 = nk= j+1 δ k B(t, Tk ). 23 In the context of equity options, the CEV (constant elasticity of variance) process was first introduced in Cox

and Ross (1976). 24 The multi-dimensional SDE which governs the dynamics of the family of forward swap rates is more involved

than the SDE for the family of Libor rates, and thus it is not reported here. The interested reader is referred to Jamshidian (1997).

386

M. Rutkowski

swaptions), which is, of course, a nice feature of this class of term structure models, as opposed to the classic models based on the specification of the dynamics of (spot or forward) instantaneous rates. On the other hand, however, due to the high dimensionality of the underlying Markov process, the efficient implementation of these models appears to be rather difficult. To circumvent this obstacle, an alternative approach was recently developed in a series of papers by Hunt and Kennedy (1997, 1998) and Hunt et al. (1996, 2000).25 It is based on the introduction of a low-dimensional Markov process which (by assumption) governs, through a simple functional dependence, the dynamics of all other relevant stochastic processes. For this reason, these class of term structure models is referred to as Markov-functional interest rate models. In economical interpretation, the underlying Markov process is assumed to represent the state of the economy; it is thus justified to refer to its components as “state variables”. Formally, one starts by introducing a one- or multi-dimensional process M, which possesses the Markov property under the terminal measure, where the generic term terminal measure is intended to cover not only cases considered in previous sections, but also other suitable choices of the numeraire portfolio. As already mentioned, the relevant processes, such as in particular the value process of the numeraire portfolio and zero-coupon bond prices, are assumed to be functions of M. For instance, if T ∗ > 0 is the horizon date, than for any t ≤ s ≤ T we have B(s, T, Ms ) B(t, T, Mt ) = E Pˆ Ft , Vt (Mt ) Vs (Ms ) where Vt (Mt ), t ≤ T ∗ , is the value process of the numeraire portfolio, and Pˆ is the associated martingale measure. The notation B(t, T, Mt ) emphasizes the direct dependence of the bond price on time variables, t and T , as well as on the state variable represented by the random variable Mt . Note that the functional from B(t, T, Mt ) is not explicitly known, except for some very special choices of dates t and T . In some instances, it may appear convenient to postulate that26 B(T, S, MT ) = A + B(S)MT VT (MT ) and to derive further properties from the martingale feature of relative prices. In the next section, we shall present a particular example of such an approach, in which we focus on the derivation of a simple formula for the so-called convexity correction. Then, in Section 4.2, we shall discuss the problem of calibration of the Markov-functional model. 25 We present here only few examples of their approach. The interested reader is referred to the original papers

and to Hunt and Kennedy (2000) for a more detailed account. 26 See Hunt et al. (1996) for alternative kinds of the functional dependence, including exponential and geometric.

10. Modelling of Forward Libor and Swap Rates

387

4.1 Terminal swap rate model The terminal swap rate model – put forward by Hunt et al. (1996) – was primarily designed for the purpose of the comparative pricing of non-standard swap contracts vis-`a-vis plain vanilla swaps (informally, this is referred to as convexity correction; see Schmidt (1996)). Let us consider, as usual, a given collection of reset/settlement dates T0 , . . . , Tn . We assume that the market price at time 0 of the (plain vanilla) fixed-for-floating swaption is known. We postulate, in addition, that it is given by Black’s formula for swaptions. Let us consider the family of bond prices B(T, S), where the maturity date S ≥ T belongs to some set S of dates. We postulate that there exist constants A and BS such that for any S ∈ S D(T, S) := B(T, S)G −1 T (n) = A + B S κ(T, T, n), where G t (n) = nj=1 δ j B(t, T j ), and (cf. (3.8)) κ(t, T, n) =

(4.1)

B(t, T ) − B(t, Tn ) B(t, T ) − B(t, Tn ) = . δ 1 B(t, T1 ) + · · · + δ n B(t, Tn ) G t (n)

Using the martingale property of discounted bond price D(·, S) and forward swap rate κ(·, T, n) under the corresponding forward swap measure associated with the choice of G(n) as a numeraire, we get D(t, S) = A + BS κ(t, T, n), or equivalently B(t, S) = A(1 − B(t, Tn )) + BS G t (n) for every t ∈ [0, T ]. We thus see that condition (4.1) is rather stringent; it implies that the price of any bond of maturity S from S can by represented as a linear combination of values of two particular portfolios of bonds, with one coefficient independent of maturity date S. The problem of whether such an assumption can be supported by an arbitrage-free model of the term structure is not addressed in Hunt et al. (1996). Let us now focus on the derivation of values of constants A and BS . To this end, we assume that equality (4.1) holds, in particular, for any S = T j , j = 1, . . . , n. Then n n n A δj + δ j BT j κ(T, T, n) = A(Tn − T0 ) + δ j BT j κ(T, T, n) = 1, j=1

j=1

j=1

and thus A = (Tn − T0 )−1 ,

n j=1

δ j BT j = 0.

(4.2)

388

M. Rutkowski

Consequently, using the first equality above and the martingale property of D(·, S) and κ(·, T, n), we obtain −1 + BS κ(0, T, n), B(0, S)G −1 0 (n) = (Tn − T0 )

(4.3)

so that for each maturity in question the constant B S is also uniquely determined. Notice that the second equality in (4.2) is also satisfied for this choice of BS . Hunt and Kennedy (2000) argue that under (4.1) the problem of pricing irregular cashflows becomes relatively easy to handle. To illustrate this point, assume that we wish to value the claim X which settles at time T and admits the following representation: m ci B(T, Si )F, X= i=1

where the ci are constants, and Si ∈ S for i = 1, . . ., m. We assume that the FT -measurable random variable F has the form F = F˜ B(T, S1 ), . . . , B(T, Sm ) for some function F˜ : Rm + → R. To be in line with the notation introduced in Section 3.4, we denote n 1 2 Vt = B(t, T ) − B(t, Tn ), Vt = δ j B(t, T j ) = G t (n). j=1

Using (4.1) and (4.2)–(4.3), we obtain m ci A(1 − B(T, Tn )) + BSi G T (n) F = w1 VT1 F + w2 VT2 F, X= i=1

m m where w1 = i=1 ci A and w2 = i=1 ci BSi . In view of the discussion in Section 3.4, it is clear that π t (X ) = w1 Vt1 E P1 (F | Ft ) + w2 Vt2 E P2 (F | Ft ).

(4.4)

Under the assumption that the forward rate κ(·, T, n) follows a geometric Brownian motion under the forward swap measure P2 , it follows also a lognormally distributed process under P1 (see the discussion in Section 3.4). Consequently, under (4.1), the joint (conditional) probability law of random variables B(T, S1 ), . . . , B(T, Sm ) under probability measures P1 and P2 are explicitly known. We conclude that the conditional expectations in (4.4) can be, in principle, evaluated. Consider, for instance, a fixed-for-floating constant maturity swap.27 To value one leg of the floating side of a constant maturity swap, consider a cashflow proportional to κ(T, T, n), which takes place at some date M > T . Ignoring the constant, 27 Similarly as in the case of a plain vanilla fixed-for-floating swap, in a constant maturity swap the fixed and

floating payments occur at regularly spaced dates. The amounts of floating payments are based not on a Libor rate, but on some other swap rate, however.

10. Modelling of Forward Libor and Swap Rates

389

such a payoff is equivalent to the claim X = B(T, M)κ(T, T, n) which settles at time T . Using (4.4), we obtain π t (X ) = B M Vt1 E P1 (κ(T, T, n) | Ft ) + AVt2 E P2 (κ(T, T, n) | Ft ). Consequently, at time 0 we have π 0 (X ) = B M (B(0, T ) − B(0, Tn ))κ(0, T, n)eσ

2T

+ AG 0 (n)κ(0, T, n),

where σ is the implied volatility of the traded swaption with maturity date T . Using the formula for B M , we get 2 π 0 (X ) = B(0, M) − AG 0 (n) κ(0, T, n)eσ T + AG 0 (n)κ(0, T, n), or finally

2 π 0 (X ) = B(0, M)κ(0, T, n) 1 + (1 − w)eσ T ,

(4.5)

where we write w = AG 0 (n)B −1 (0, M). It should be stressed that the simple valuation result (4.5) hinges on the strong assumption (4.1).

4.2 Calibration of Markov-functional models The most important feature of Markov-functional models is the fact that their calibration to market prices of plain vanilla derivatives is relatively easy to perform. For convenience, we shall focus here on the calibration of the Markov-functional model of fixed-maturity forward swap rates. The case of forward Libor rates can be dealt with in an analogous way. A more extensive discussion of this issue can be found in Hunt et al. (2000). First, we assume that the forward swap rate for the date Tn−1 follows a lognormal martingale under the corresponding forward measure P Tn . More specifically, we postulate that the process κ(·, ˜ Tn−1 ) = κ(·, Tn−1 , 1) satisfies ˜ Tn−1 )ν(t, Tn−1 )dWt , dτ κ(t, Tn−1 ) = κ(t,

(4.6)

where W is a Brownian motion under PTn and ν(·, Tn−1 ) is a strictly positive deterministic function. If we take the process t Mt = ν(u, Tn−1 ) dWu 0

as the driving Markov process for our model, then clearly 1 Tn−1

˜ Tn−1 ) e MTn−1 − 2 κ(T ˜ n−1 , Tn−1 ) = κ(0,

0

ν 2 (u,Tn−1 ) du

(4.7)

390

M. Rutkowski

and

−1 1 Tn−1 2 B(Tn−1 , Tn , MTn−1 ) = 1 + δ n κ(0, ˜ Tn−1 ) e MTn−1 − 2 0 ν (u,Tn−1 ) du .

(4.8)

Suppose that we are given (digital) swaptions prices for all strikes κ > 0 and all expiration dates T0 , . . . , Tn−1 . Our goal is to find the joint probability law of (κ(T ˜ 0 , T0 ), . . . , κ(T ˜ n−1 , Tn−1 )) under PTn . This can be achieved by deriving the functional dependence of each rate κ(T ˜ j , T j ) on the underlying Markov process; more specifically, we search for the function h j : R+ → R+ such that κ(T ˜ j , Tj ) = h j (MT j ). To this end, we assume that for any j = 0, . . . , n − 1 there exists a strictly increasing function h j such that this holds (in view of (4.7), this statement is valid for j = n − 1). By the definition of the probability measure PTn , for i = j + 1, . . . , n B(Ti , Ti ) B(Ti , Ti ) B(T j , Ti ) = E PTn FTi = E PTn MT j B(T j , Tn ) B(Ti , Tn ) B(Ti , Tn ) since FTi = FTWi = FTMi . Therefore, if B(Ti , Tn ) = B(Ti , Tn , MTi ) we obtain 1 B(T j , Ti ) = E PTn MT j , B(T j , Tn ) B(Ti , Tn , MTi ) so that the right-hand side in the formula above is a function of MT j . Consequently, for n δ i B(T j , Ti ) G T j (n − j) = i= j+1

we get n G T j (n − j) δi = E P Tn MT j = g j (MT j ), B(T j , Tn ) B(Ti , Tn , MTi ) i= j+1

(4.9)

where g j : R → R is a measurable function with strictly positive values. The right-hand side in (4.9) can be evaluated using the transition p.d.f. p M (t, m; u, x) of the Markov process M, provided that the functional form of B(Ti , Tn , MTi ) is known for every i = j + 1, . . . , n. To put it more explicitly, n δ i p M (T j , m; Ti , x) g j (m) = d x. (4.10) B(Ti , Tn , x) i= j+1 R We work back iteratively from the last relevant date Tn−1 . In the first step, i.e., when j = n − 2, the functional form of B(Tn−1 , Tn , MTn−1 ) is given by (4.8). Assume now that the functional forms of B(Ti , Tn , MTi ) were already found for

10. Modelling of Forward Libor and Swap Rates

391

i = j + 1, . . . , n − 1. In order to determine B(T j , Tn , MT j ), it is enough to find the functional form of the swap rate κ(T ˜ j , T j ). Indeed, we have κ(T ˜ j , Tj ) =

1 − B(T j , Tn ) G T j (n − j)

and thus ˜ j , Tj ) B −1 (T j , Tn ) = 1 + κ(T

G T j (n − j) = 1 + h j (MT j )g j (MT j ). B(T j , Tn )

(4.11)

Our next goal is to show how to find the function h j , under the assumption that the functional forms of bonds prices B(Ti , Tn , MTi ) are known for every i = j + 1, . . . , n. To this end, we assume that we are given all market prices of digital swaptions with expiration date T j and any strictly positive strike level κ. We find it convenient to represent the price at time 0 of the j th digital swaption, with strike κ and expiration date T j , in the following way:28 G T j (n − j) j 11 {κ(T DS 0 (κ) = B(0, Tn ) E PTn ˜ j ,T j )>κ} B(T j , Tn ) for j = 0, . . . , n − 2. Under the present assumptions, we obtain j DS 0 (κ) = B(0, Tn ) E PTn g j (MT j ) 11 {h j (MT j )>κ} , or equivalently, j DS 0 (κ) = B(0, Tn ) E PTn g j (MT j ) 11 {MT

>h −1 j (κ)} j

.

Finally, if we denote by f M (x) = p M (0, 0; T j , x) the p.d.f. of MT j under PTn , then j DS 0 (κ) = B(0, Tn ) g j (x) 11 {x>h˜ j (κ)} f M (x) d x, (4.12) R

j 29 where we write hˆ j = h −1 j . It is natural to assume that the function DS 0 : R+ → R+ is strictly decreasing as a function of the strike level κ, with j DS 0 (0)

=

n

δ i B(0, Ti ) = G 0 (n − j)

i= j+1 j

and DS 0 (+∞) = 0. Since E PTn g j (MT j ) = G 0 (n − j)B −1 (0, Tn ) 28 By definition, the j th digital swaption, with unit notional principal, pays the amount δ at time T for i = i i j + 1, . . . , n whenever the inequality κ(T ˜ j , T j ) > κ holds. 29 Recall that the function DS j represents the observed market prices of digital swaptions. Therefore, the 0

foregoing assumptions about the behaviour of this function are indeed quite natural.

392

M. Rutkowski

it can be deduced from (4.12) that hˆ j (0) = −∞. On the other hand, condition j DS 0 (+∞) = 0 implies that hˆ j (+∞) = +∞. Finally, the function hˆ j implicitly defined through equality (4.12) is strictly increasing, so that it admits an inverse function h j with desired properties. To wit, for h j = hˆ −1 j we have: h j : R → R+ is strictly increasing, with h j (−∞) = 0 and h j (+∞) = +∞. This shows that the procedure above leads to a reasonable specification of the functional form κ(T ˜ j , T j ) = h j (MT j ). For the reader’s convenience, we shall recapitulate the main steps of the calibration procedure. In the first step, we numerically find the function h n−2 which expresses κ(T ˜ n−2 , Tn−2 ) in terms of MTn−2 . To this end, we need first to evaluate the function gn−2 using formula (4.10) with B(Tn , Tn , x) = 1 and B(Tn−1 , Tn , x) given by (4.8). In the second step, we first determine B(Tn−2 , Tn , x) using relationship (4.11), that is, B −1 (Tn−2 , Tn , x) = 1 + h n−2 (x)gn−2 (x). Then, we find gn−3 using (4.10), and subsequently we determine the rate κ(T ˜ n−3 , Tn−3 ), or rather the corresponding function h n−3 . Continuing this procedure, we end up with the following representation of the finite family of swap rates: (κ(T ˜ 0 , T0 ), . . . , κ(T ˜ n−1 , Tn−1 ) = g0 (MT0 ), . . . , gn−1 (MTn−1 ) . This representation uniquely specifies the probability law of the considered family of swap rates under the terminal forward measure PTn . Remarks In view of (4.6), the price at time t ≤ Tn−1 of the (n −1)th digital swaption equals (κ) = δ n B(t, Tn ) PTn {κ(T ˜ n−1 , Tn−1 ) > κ | Ft }, DS n−1 t that is,

DS n−1 (κ) = δ n B(t, Tn )N h˜ 2 (t, Tn−1 ) , t

(4.13)

where N denotes the standard Gaussian cumulative distribution function, and the coefficient h˜ 2 is given in the formulation of Proposition 3.5. Needless to say that formula (4.13) is not valid in the present setup, even for t = 0, for any digital swaption with maturity T0 , . . . , Tn−2 . Moreover, it is clear that assumption (4.6) is not necessary; we need only assume that the functional form of the swap rate κ(T ˜ n−1 , Tn−1 ) with respect to some underlying Markov process M is explicitly known (and is a monotone function of MTn−1 ).

10. Modelling of Forward Libor and Swap Rates

393

References Andersen, L. (2000), A simple approach to the pricing of Bermudan swaptions in the multifactor LIBOR market model, Journal of Computational Finance 3(2), 5–32. Andersen, L. and Andreasen, J. (1997), Volatility skews and extensions of the Libor market model, working paper, National Australia Bank and University of New South Wales. Brace, A. (1996), Dual swap and swaption formulae in the normal and lognormal models, working paper, University of New South Wales. Brace, A., Ga¸ tarek, D. and Musiela, M. (1997), The market model of interest rate dynamics, Mathematical Finance 7, 127–54. Brace, A., Musiela, M. and Schl¨ogl, E. (1998), A simulation algorithm based on measure relationships in the lognormal market model, working paper, University of New South Wales. Brace, A. and Womersley, R.S. (2000), Exact fit to the swaption volatility matrix using semidefinite programming, working paper, National Australia Bank and University of New South Wales. B¨uhler, W. and K¨asler, J. (1989), Konsistente Anleihenpreise und Optionen auf Anleihen, working paper, University of Dortmund. Cox, J. and Ross, S. (1976), The valuation of options for alternative stochastic processes, Journal of Financial Economics 3, 145–66. D¨oberlein, F. and Schweizer, M. (1998), On term structure models generated by semimartingales, working paper, Technische Universit¨at Berlin. D¨oberlein, F., Schweizer, M. and Stricker, C. (2000), Implied savings accounts are unique, Finance and Stochastics 4, 431–42. Dun, T., Schl¨ogl, E. and Barton, G. (2000), Simulated swaption delta-hedging in the lognormal forward LIBOR model, working paper, University of Sydney and University of Technology, Sydney. Flesaker, B. (1993), Arbitrage free pricing of interest rate futures and forward contracts, Journal of Futures Markets 13, 77–91. Flesaker, B. and Hughston, L. (1996a), Positive interest, Risk 9(1), 46–9. Flesaker, B. and Hughston, L. (1996b), Positive interest: foreign exchange, in: Vasicek and Beyond, L. Hughston, ed., Risk Publications, London, pp. 351–67. Flesaker, B. and Hughston, L. (1997), Dynamic models of yield curve evolution, in: Mathematics of Derivative Securities, M.A.H. Dempster and S.R. Pliska, eds., Cambridge University Press, Cambridge, pp. 294–314. Geman, H., El Karoui, N. and Rochet, J.C. (1995), Changes of numeraire, changes of probability measures and pricing of options, Journal of Applied Probability 32, 443–58. Glasserman, P. and Kou, S.G. (1999), The term structure of simple forward rates with jump risk, working paper, Columbia University. Glasserman, P. and Zhao, X. (1999), Fast greeks by simulation in forward LIBOR models, Journal of Computational Finance 3(1), 5–39. Glasserman, P. and Zhao, X. (2000), Arbitrage-free discretization of lognormal forward Libor and swap rate model, Finance and Stochastics 4, 35–68. Goldys, B. (1997), A note on pricing interest rate derivatives when Libor rates are lognormal, Finance and Stochastics 1, 345–52. Goldys, B., Musiela, M. and Sondermann, D. (1994), Lognormality of rates and term structure models, working paper, University of New South Wales. Heath, D., Jarrow, R. and Morton, A. (1992), Bond pricing and the term structure of

394

M. Rutkowski

interest rates: a new methodology for contingent claim valuation, Econometrica 60, 77–105. Hull, J.C. and White, A. (1999), Forward rate volatilities, swap rate volatilities, and the implementation of the LIBOR market model, working paper, University of Toronto. Hunt, P.J. and Kennedy, J.E. (1997), On convexity corrections, working paper, ABN-Amro Bank and University of Warwick. Hunt, P.J. and Kennedy, J.E. (1998), Implied interest rate pricing model, Finance and Stochastics 2, 275–93. Hunt, P.J. and Kennedy, J.E. (2000) Financial Derivatives in Theory and Practice, John Wiley & Sons, Chichester. Hunt, P.J., Kennedy, J.E. and Pelsser, A. (2000), Markov-functional interest rate models, Finance and Stochastics 4, 391–408. Hunt, P.J., Kennedy, J.E. and Scott, E.M. (1996), Terminal swap-rate models, working paper, ABN-Amro Bank and University of Warwick. Jamshidian, F. (1996), Pricing and hedging European swaptions with deterministic (lognormal) forward swap rate volatility, working paper, Sakura Global Capital. Jamshidian, F. (1997), Libor and swap market models and measures, Finance and Stochastics 1, 293–330. Jamshidian, F. (1999), Libor market model with semimartingales, working paper, NetAnalytic Limited. Jin, Y. and Glasserman, P. (1997), Equilibrium positive interest rates: a unified view, forthcoming in Review of Financial Stuidies. Lotz, C. and Schl¨ogl, L. (2000), Default risk in a market model, Journal of Banking and Finance 24, 301–27. Miltersen, K., Sandmann, K. and Sondermann, D. (1997), Closed form solutions for term structure derivatives with log-normal interest rates, Journal of Finance 52, 409–30. Musiela, M. (1994), Nominal annual rates and lognormal volatility structure, working paper, University of New South Wales. Musiela, M. and Rutkowski, M. (1997a) Martingale Methods in Financial Modelling, Springer-Verlag, Berlin. Musiela, M. and Rutkowski, M. (1997b), Continuous-time term structure models: forward measure approach, Finance and Stochastics 1, 261–91. Musiela, M. and Sawa, J. (1998), Interpolation and modelling term structure, working paper, University of New South Wales. Musiela, M. and Sondermann, D. (1993), Different dynamical specifications of the term structure of initial rates and their implications, working paper, University of Bonn. Neuberger, A. (1990), Pricing swap options using the forward swap market, working paper, London Business School. Rady, S. (1997), Option pricing in the presence of natural boundaries and a quadratic diffusion term, Finance and Stochastics 1, 331–44. Rady, S. and Sandmann, K. (1994), The direct approach to debt option pricing, Review of Futures Markets 13, 461–514. Rebonato, R. (1999), On the pricing implications of the joint lognormal assumption for the swaption and cap markets, Journal of Computational Finance 2(3), 57–76. Rebonato, R. (2000), On the simultaneous calibration of multifactor lognormal interest rate models to Black volatilities and to the correlation matrix, Journal of Computational Finance 2(4), 5–27. Rutkowski, M. (1997), A note on the Flesaker-Hughston model of term structure of interest rates, Applied Mathematical Finance 4, 151–63. Rutkowski, M. (1998), Dynamics of spot, forward, and futures Libor rates, International

10. Modelling of Forward Libor and Swap Rates

395

Journal of Theoretical and Applied Finance 1, 425–45. Rutkowski, M. (1999), Models of forward Libor and swap rates, Applied Mathematical Finance 6, 29–60. Sandmann, K. and Sondermann, D. (1993), On the stability of lognormal interest rate models, working paper, University of Bonn. Sandmann, K. and Sondermann, D. (1997), A note on the stability of lognormal interest rate models and the pricing of Eurodollar futures, Mathematical Finance 7, 119–25. Sandmann, K., Sondermann, D. and Miltersen, K.R. (1995), Closed form term structure derivatives in a Heath–Jarrow–Morton model with log-normal annually compounded interest rates, in: Proceedings of the Seventh Annual European Futures Research Symposium Bonn, 1994, Chicago Board of Trade, pp. 145–65. Schl¨ogl, E. (1999), A multicurrency extension of the lognormal interest rate market model, working paper, University of Technology, Sydney. Schmidt, W.M. (1996), Pricing irregular interest cash flows, working paper, Deutsche Morgan Grenfell. Schoenmakers, J. and Coffey, B. (1999), Libor rates models, related derivatives and model calibration, working paper. Sidenius, J. (1997), Libor market models in practice, Journal of Computational Finance 3(3), 5–26. Uratani, T. and Utsunomiya, M. (1999), Lattice calculation for forward LIBOR model, working paper, Hosei University. Yasuoka, T. (1998), No arbitrage relation between a swaption and a cap/floor in the framework of Brace, Gatarek and Musiela, working paper, Fuji Research Institute Corporation. Yasuoka, T. (1999), Mathematical pseudo-completion of the BGM model, working paper, Fuji Research Institute Corporation.

Part three Risk Management and Hedging

11 Credit Risk Modelling: Intensity Based Approach Tomasz R. Bielecki and Marek Rutkowski

1 Introduction Let B(t, T ) and D(t, T ) denote prices at time t of default-free and default-risky (or defaultable) zero coupon bonds maturing at time T , respectively. The default-free bond pays $1 at time T . The (recovery) payment for the default-risky bond needs to be modelled. Two major situations are commonly considered (if the bond defaults prior to or on the maturity date then): (a) the recovery payment is received by the holder of the defaultable bond at the default time of the bond, or (b) the recovery payment is received by the holder of the defaultable bond at the maturity time of the bond. Of course, if the defaultable bond does not default prior to or on the maturity date, then it pays $1 at maturity. In this chapter we present a survey of recent research efforts aimed at pricing and hedging of default-prone debt instruments. We concentrate on intensity and ratings based approaches. In particular we review some results derived by Duffie, Schr¨oder and Skiadas (1996), Duffie and Singleton (1998a, 1999), Jarrow and Turnbull (1995, 2000), Jarrow, Lando and Turnbull (1997), Lando (1998), Madan and Unal (1998a, 1998b), Jeanblanc and Rutkowski (2000a, 2000b), Bielecki and Rutkowski (1999, 2000), and Lotz and Schl¨ogl (2000), among results obtained by other researchers. In addition we present a brief survey of some important types of credit derivatives, that is derivative products linked to either corporate or sovereign debt, and we describe how to price them within the Bielecki and Rutkowski approach. It should be emphasized that the need to rationally price and hedge credit derivatives, whose presence in financial markets has been continuously growing in the recent years, was one of the motivations, besides the need to manage credit risk, behind the explosion of research on quantitative aspects of the credit risk that has been observed in the 1990s. Let us mention here that the firm-specific approach – that is, an approach based on observations of the value of debt’s issuer – is not addressed in the present 399

400

T. R. Bielecki and M. Rutkowski

chapter. This alternative approach was initiated in the 1970s by Merton (1974), Black and Cox (1976), and Geske (1977). It was subsequently developed in various directions by several authors; to mention a few: Brennan and Schwartz (1997, 1980), Pitts and Selby (1983), Rendleman (1992), Kim et al. (1993), Nielsen et al. (1993), Leland (1994), Longstaff and Schwartz (1995), Leland and Toft (1996), Mella-Barral and Tychon (1996), Briys and de Varenne (1997), Crouhy et al. (1998, 2000), Duffie and Lando (1998), and Anderson and Sundaresan (2000). Reviewing this approach would require a separate article (see, e.g., Ammann (1999)). The list of references is not representative of all important papers and books published in this area in recent years, but it includes works that are most related to this presentation.

2 Credit derivatives Credit derivatives are privately negotiated derivatives securities that are linked to a credit-sensitive asset as the underlying asset. More specifically, the reference security of a credit derivative can be an actively-traded corporate or sovereign bond or a portfolio of these bonds. A credit derivative can also have a loan (or a portfolio of loans) as the underlying reference credit. Credit derivatives can be structured in a large variety of ways; they are typically complex agreements, customized to the precise needs of an investor. The common feature of all credit derivatives is the fact that they allow for the transference of the credit risk from one counterparty to another, so that they can be used to control the credit risk exposure. Credit risk refers to the possibility that a borrower will fail to service or repay a debt on time. The overall risk we are concerned with involves two components: market risk and asset-specific credit risk. In contrast to ‘standard’ interest-rate derivatives, credit derivatives allow us to isolate and handle not only the market risk, but also the firm-specific credit risk. They provide also a way to synthesize assets that are otherwise not available to a particular investor (in this application, an investor ‘buys’ – rather then ‘sells’ – a specific credit risk). Similarly as in the case of derivative securities associated with the risk-free term structure, we may formally distinguish three main types of agreements: forward contracts, swaps, and options. A forward contract commits the buyer to purchasing a specified bond at a specified future date at a price predetermined at contract inception. In a forward contract, the default risk is normally borne by the buyer. If a credit event occurs, the transaction is marked to market and unwound. Forward contracts can also be transacted in spread form; that is, the agreement can be based on the specified bond’s spread over a benchmark asset. It should be stressed that the classification above does not corresponds to market terminological conventions, as described below.

11. Credit Risk Models: Intensity Based Approach

401

In market practice, the most popular credit-sensitive swap contract is a total rate of return swap, explained in some detail in Section 2.1 below. Credit options are typically embedded in complex credit-sensitive agreements, though the over-thecounter traded credit options – such as default puts, also described in Section 2.1 – are also available. Let us finally mention the so-called vulnerable options, or more generally, vulnerable claims. These are contingent agreements that are issued by credit-sensitive institutions, so that they are subject to default in much the same way as defaultable bonds.

2.1 Overview of instruments We first review the most actively traded types of credit-sensitive agreements.1 It should be stressed that we do not intend to examine here all aspects of credit derivatives as a tool in the risk management. The non-exhaustive list of examples given below makes it clear that a wide range of objectives can be achieved by trading in credit derivatives. For an extensive analysis of economical reasons which support the use of these products, we refer to Das (1998a, 1998b) or Tavakoli (1998). Total rate of return swaps Total rate of return swaps (total return swaps, for short) are agreements in which the total return of an underlying credit-sensitive asset (basket of assets, index, etc.) is exchanged for some other cash flow. More specifically, one party agrees to pay the total return (income plus or minus any change in the capital value) on a notional principal amount to another party in return for periodic fixed or floatingrate payments on the same notional amount. Let us enumerate the most important features of a total return swap: (a) no principal amounts are exchanged and no physical change of ownership occurs, (b) the maturity of the total return swap agreement need not match that of the underlying, (c) at the contract termination – i.e., at the contract maturity or upon default – according to Das (1998a), ‘a price settlement based on the change in the value of the bond or loan is made’. Total return swaps can incorporate put and call options (to establish caps and floors on the returns of the reference assets), as well as caps and floors on a floating interest rates. Credit-spread swaps and options With credit-spread swaps (that is, relative performance total return swaps), also known as credit-spread forwards, investors pay the total return of one asset while receiving the total return of another credit-sensitive asset. Credit-spread options 1 Let us mention that the terminological conventions relative to credit derivatives are not yet fully standardized;

we shall try to follow the most widely accepted terminology.

402

T. R. Bielecki and M. Rutkowski

are option agreements whose payoff is associated with the yield differential of two credit-sensitive assets. For instance, the reference rate of the option can be a spread of a corporate bond over a benchmark asset of comparable maturity. The option can be settled either in cash or through physical delivery of the underlying bond, at a price whose yield spread over the benchmark asset equals the strike spread. Options on credit spreads allow one to isolate the firm-specific credit risk from the market risk. Credit (default) swaps These are agreements in which a periodic fixed payments (or upfront fee) from the protection buyer is exchanged for the promise of some specified payment from the protection seller to be made only if a particular, predetermined credit event occurs. If, during the term of the default swap, a credit event occurs, the seller pays the buyer an amount to cover the loss, and the swap then terminates. If no credit event has occurred by maturity of the swap, both sides end their obligations to each other. The most important covenants of a credit swap contract are: (a) the specification of the credit event, which is formally defined as a ‘default’ (in practice, it may include: bankruptcy, insolvency, payment default, a stipulated price decline for the reference asset, or a rating downgrade for the reference asset), (b) the contingent default payment, which may be structured in a number of ways; for instance, it may be linked to the price movement of the reference asset, or it can be set at a predetermined level (e.g., a fixed percentage of the notional amount of the transaction), (c) the specification of periodic payments which depend, in large part, on the credit quality of the reference asset. Credit swaps are usually settled in cash, but the agreement may also provide for physical delivery; for example, it may involve payment at par by the seller in exchange for the delivery of the defaulted reference asset. If the payment is triggered by the default and equals to the difference between the face value of a bond and its market price, the contract is named the default swap. Let us finally mention the so-called first-to-default swaps, which are examples of basket default swaps (i.e., default swaps linked to a portfolio of credit-sensitive securities). Credit (default) options A credit call (put, resp.) option gives the right to buy (to sell, resp.) an underlying credit-sensitive asset (index, credit spread, etc.) at a predetermined price. The most widely used type of a credit option is a default put. The buyer of the default put pays a premium (either an upfront fee or a periodic payment) to the seller who then assumes the default risk for the reference asset. If there is a credit (default) event during the term of the option, the seller pays the buyer a (fixed or variable) default payment.

11. Credit Risk Models: Intensity Based Approach

403

Credit linked notes Credit linked notes are debt instruments in which the coupon or price of the note is linked to the performance of a reference credit-sensitive asset (rate or index). For instance, a credit-linked note may stipulate that the principal repayment is reduced to a certain level below par if the external corporate or sovereign debt defaults before the maturity of the note. This means that the buyer of the note sells credit protection to the issuer of the note; in exchange the note pays a higher-than-normal yield.

2.2 Market pricing methods Since a reliable benchmark model for credit derivatives is not yet available, it is common in market practice to value a credit derivative on a stand-alone basis, using a judiciously chosen ad hoc approach, rather than a sophisticated mathematical model. We shall review the most widely used of these approaches. For explanatory purposes, we focus on the valuation of a default swap, and we base our description of the pricing methods2 on BeSaw (1997). Same-cost as reference method To estimate the price of a default swap, one assumes that there exists an insured bond which is otherwise identical to the reference bond of the swap. The spread between the yield of the insured bond and that of the reference bond can then be taken as the proxy of the default swap price. Notice that this method identifies a default swap with bond insurance, and disregards the credit difference between the bond insurer and the default swap counterparty. Credit-spread-based method This way of default swap valuation is based on a comparison of the yield of the reference bond and the yield of a risk-free bond with similar maturity. It is thus implicitly assumed that the spread over the risk-free asset is entirely due to the credit risk so that the impact of tax and/or liquidity effects are neglected. Another difficulty arises when one wishes to price a swap with maturity which does not correspond to the maturity of the reference corporate bond. Replication of cost method In this method, the price of a default swap is calculated through evaluation of the cost of a portfolio necessary to replicate the swap. The replication of cost method 2 For an exhaustive analysis of practical aspects of credit swaps and a review of non-technical methods of their

valuation (including the estimation of hazard rates), we refer to Duffie (1999).

404

T. R. Bielecki and M. Rutkowski

thus mimics the standard approach to contingent claims valuation in an arbitragefree setup. Unfortunately, it is typically not possible or too costly to establish a (static or dynamic) portfolio which fully hedges (i.e. replicates) a credit derivative. Ratings-based default method This approach, which will be analysed in more details in what follows, determines the price of a credit derivative (for instance, a default swap) as the expected loss resulting from default. To derive default probabilities, it is common to model the Markov chain representing ratings migration process using the estimated credit ratings transition matrix. If the valuation is made on a stand-alone basis, it would be more adequate to use the firm-specific transition matrix corresponding to the reference asset. It is clear that such a matrix is not easily available, however. Similarly, constant (or random) recovery rates, which are needed to evaluate the expected loss, are either inferred using the historical data, or assessed on a stand-alone basis. The credit-spread-based default method can be seen as a variant of a ratings-based default method. It uses an issuer-specific credit spread over default-free instruments of similar maturity to estimate the probability of default and the expected recovery rate in default.

3 Valuation of defaultable claims The exposition in this section is mainly based on Duffie et al. (1996). In this section, our goal is to present the most fundamental results which can be obtained using the intensity-based approach. In Section 4, special attention will be paid to the various kinds of recovery rates, such as, for instance, zero recovery, fractional recovery of par, and fractional recovery of market value. On the other hand, in order to obtain as explicit valuation formulae as possible, we shall still assume that only two states are possible, namely, non-default and default. An analysis of the case of several credit rating classes is postponed to Sections 5–7. We make the following standing assumptions. (A.1) We are given a probability space (, G, P∗ ), endowed with the filtration F = (Ft ) t∈R+ (of course, Ft ⊂ G for every t ∈ R+ ). The probability measure P∗ is interpreted as a martingale measure for our underlying securities market model (complete or not). Let τ be a non-negative random variable on the probability space (, G, P∗ ). In what follows, we shall refer to τ as the default time. For convenience, we assume that for every t ∈ R+ , P∗ {τ = 0} = 0 and P∗ {τ > t} > 0. Given a default time τ , we introduce the associated (single) jump process H by setting Ht = 11{τ ≤t} for t ∈ R+ . It is obvious that H is a right-continuous process. Let H be the filtration generated by the process H ,

11. Credit Risk Models: Intensity Based Approach

405

i.e., Ht = σ (Hu : u ≤ t). We introduce the enlarged filtration G which satisfies G = H ∨ F – that is, Gt = Ht ∨ Ft = σ (Ht , Ft ) for every t. (A.2) For a given default-risky security, its default process is modelled through a jump process H with strictly positive intensity (or hazard rate) process3 λ under P∗ . The intensity λ is an F-progressively measurable process such that the compensated process t∧τ t Mt := Ht − λu du = Ht − h u du, ∀ t ∈ [0, T ∗ ], (3.1) 0

0

follows an G-martingale under P∗ . Notice that the auxiliary G-adapted process h satisfies h t := 11{t≤τ } λt . Remarks Let us stress that the stochastic intensity λ is assumed to follow an Fadapted adapted process, and the filtration of reference F can be strictly smaller than G, in general. On the other hand, the case of an F-stopping time is also covered (in this case, F = G). (A.3) Given a maturity date T > 0, an FT -measurable random variable X represents the promised claim, that is, the amount of cash which the owner of a defaultable claim is entitled to receive at time T , provided that the default has not occurred before the maturity date T . (A.4) An F-predictable process Z models the payoff which is actually received by the owner of a defaultable claim, if default occurs before maturity T . We shall refer to Z as the recovery process of X . (A.5) An F-adapted process r stands for the short-term interest rate, and Bt := t exp( 0 ru du), t ∈ R+ , is the associated savings account process. The main result in the intensity-based approach states that a defaultable security can be priced as if it were a default-risk free security, provided that the credit spread is already incorporated in the risk premium. In other words, the risk premium process of a defaultable security differs from that associated with a risk-free bond, both in the real-world and in the risk-neutral world. In particular, in a risk-neutral world the risk premium associated with a risk-free bond vanishes, but the risk premium associated with a defaultable security is still present. 3 We refer to Artzner and Delbaen (1995), Kusuoka (1999), Rutkowski (1999), Elliott et al. (2000) or Jeanblanc

and Rutkowski (2000a, 2000b) for more details on stochastic intensities.

406

T. R. Bielecki and M. Rutkowski

Example 3.1 If the intensity process λt = λ > 0 is constant, the process H can be seen as a continuous-time Markov chain with the state space {0, 1}, and with constant intensity matrix = [λi j ] 0≤i, j≤1 , where λ00 = −λ, λ01 = λ, and λ1i = 0 for i = 0, 1 (so that the state 1 is absorbing). In this case, τ can be seen as the first jump time of a standard Poisson process N with constant intensity λ. This simple example can be generalized in two directions. First, in some circumstances it might be natural to assume that λt = λ(Yt ), where Y is a given k-dimensional F-adapted stochastic process, and λ : Rk → R+ is a positive deterministic function. Second, the basic model can be extended to accommodate for different credit rating classes, t = [λi j (Yt )] 0≤i, j≤K , with K being an absorbing state (see, e.g., Jarrow et al. (1997) or Section 6). We need first to formally define the value process S of a (European) defaultable claim, represented by a triplet (X, Z , τ ) and maturity date T . Since we assume throughout that P∗ is a spot martingale measure, it is natural to postulate that the value S0 at time 0 of a defaultable claim (X, Z , τ ) equals (3.2) S0 := B0 E P∗ Bu−1 d Du , ]0,T ]

where B stands for the savings account process, and D is the ‘dividend process’ (cf. (A.3)–(A.4)) Z u d Hu + X (1 − HT )11{t=T } . (3.3) Dt = ]0,t]

Formula (3.2) can be easily generalized to give the price of a defaultable claim at any date t, namely Bu−1 d Du Gt , St := Bt E P∗ (3.4) ]t,T ]

or equivalently, St := Bt E P∗

]t,T ]

Bu−1 Z u d Hu + BT−1 X 11{T

HANDBOOKS IN MATHEMATICAL FINANCE

Option Pricing, Interest Rates and Risk Management Edited by E. Jouini Universit´e Paris – Dauphine and CREST

J. Cvitani´c University of Southern California

Marek Musiela Paribas, London

PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE

The Pitt Building, Trumpington Street, Cambridge, United Kingdom CAMBRIDGE UNIVERSITY PRESS

The Edinburgh Building, Cambridge CB2 2RU, UK 40 West 20th Street, New York, NY 10011-4211, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia Ruiz de Alarco´n 13, 28014, Madrid, Spain Dock House, The Waterfront, Cape Town 8001, South Africa http://www.cambridge.org c Cambridge University Press 2001 This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2001 Reprinted 2004 Printed in the United Kingdom at the University Press, Cambridge Typeface Times 11/14pt. System LATEX 2ε [ DBD] A catalogue record of this book is available from the British Library Library of Congress Cataloguing in Publication Data Advances in mathematical finance / edited by E. Jouini, J. Cvitani´c, Marek Musiela. p. cm. Includes bibliographic references and index. ISBN 0 521 79237 1 1. Derivatives securities–Prices–Mathematical models. 2. Interest rates–Mathematical models. 3. Risk management. 4. Securities–Mathematical models. I. Jouini, E. (Ely`es), 1965– II. Cvitani´c, J. (Jaksa), 1962– III. Musiela, Marek, 1950– HG6024.A3 A38 2001 332 .01 51–dc21 00-052911 ISBN 0 521 79237 1

hardback

Contents

List of Contributors Introduction

page vii ix

Part one: Option Pricing: Theory and Practice 1 1 Arbitrage Theory Yu. M. Kabanov 3 2 Market Models with Frictions: Arbitrage and Pricing Issues E. Jouini and C. Napp 43 3 American Options: Symmetry Properties J. Detemple 67 4 Purely Discontinuous Asset Price Processes D. B. Madan 105 5 Latent Variable Models for Stochastic Discount Factors R. Garcia and ´ Renault E. 154 6 Monte Carlo Methods for Security Pricing P. Boyle, M. Broadie and P. Glasserman 185 Part two: Interest Rate Modeling 239 7 A Geometric View of Interest Rate Theory T. Bj¨ork 241 8 Towards a Central Interest Rate Model A. Brace, T. Dun and G. Barton 278 9 Infinite Dimensional Diffusions, Kolmogorov Equations and Interest Rate Models B. Goldys and M. Musiela 314 10 Modelling of Forward Libor and Swap Rates M. Rutkowski 336 Part three: Risk Management and Hedging 397 11 Credit Risk Modelling: Intensity Based Approach T. R. Bielecki and M. Rutkowski 399 12 Towards a Theory of Volatility Trading P. Carr and D. Madan 458 13 Shortfall Risk in Long-Term Hedging with Short-Term Futures Contracts P. Glasserman 477 14 Numerical Comparison of Local Risk-Minimisation and Mean-Variance Hedging D. Heath, E. Platen and M. Schweizer 509 v

vi

Contents

15 A Guided Tour through Quadratic Hedging Approaches

M. Schweizer

538

Part four: Utility Maximization 575 16 Theory of Portfolio Optimization in Markets with Frictions J. Cvitani´c 577 17 Bayesian Adaptive Portfolio Optimization I. Karatzas and X. Zhao 632

Contributors

G. Barton, Department of Chemical Engineering, University of Sydney, Sydney, Australia. T. Bielecki, Department of Mathematics, The Northeastern Illinois University, Chicago, USA. T. Bj¨ork, Department of Finance, Stockholm School of Economics, Box 6501, S-11383 Stockholm, Sweden. P. Boyle, School of Accountancy, University of Waterloo, Waterloo, Ontario N2L 3GI, Canada. Alan Brace, FMMA and NAB, PO Box 731, Grosvenor Place, Sydney 2000, Australia. M. Broadie, Graduate School of Business, Columbia University, New York, NY 10027, USA. P. Carr, Morgan Stanley, 1585 Broadway, 6th floor, New York, NY 10036, USA. J. Cvitani´c, Department of Mathematics, University of Southern California, 1042 West 36th Place, Los Angeles, CA 90089-1113, USA. J. Detemple, School of Management, Boston University, 595 Commonwealth Avenue, Boston, MA 02215, USA. T. Dun, Department of Chemical Engineering, University of Sydney, Sydney, Australia. ´ R. Garcia, D´epartement de Sciences Economiques, Universit´e de Montr´eal, Montr´eal (PQ) H3C 3J7, Canada. P. Glasserman, Columbia Business School, Columbia University, New York, NY 10027, USA. B. Goldys, School of Mathematics, University of New South Wales, Sydney, 2052 NSW, Australia. D. Heath, University of Technology, Sydney, School of Finance & Economics, PO Box 123, Broadway, 2007 NSW, Australia. E. Jouini, Universit´e Paris IX Dauphine, CEREMADE, Place du Mar´echal de Lattre de Tassigny, 75775 Paris, Cedex 16, France. Yu. M. Kabanov, Laboratoire de Math´ematiques, Universit´e de Franche-Comt´e, 16 Route de Gray, F-25030 Besanc¸ on, Cedex, France. I. Karatzas, Departments of Mathematics and Statistics, Columbia University, New York, NY 10027, USA. D. Madan, College of Business and Management, University of Maryland, College Park, MD 20742, USA.

vii

viii

List of contributors

M. Musiela, Paribas, 10 Harewood Avenue, London NW1 6AA, UK. C. Napp, Universit´e Paris IX Dauphine, CEREMADE, Place du Mar´echal de Lattre de Tassigny, 75775 Paris, Cedex 16, France. E. Platen, University of Technology, Sydney, School of Finance & Economics, PO Box 123, Broadway, 2007 NSW, Australia. ´ E. Renault, D´epartement de Sciences Economiques, Universit´e de Montr´eal, Montr´eal (PQ) H3C 3J7, Canada. M. Rutkowski, Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-661 Warsaw, Poland. M. Schweizer, Technische Universit¨at Berlin, Fachbereich Mathematik, Strasse des 17. Juni 136, D-10623, Berlin, Germany. X. Zhao, Departments of Mathematics and Statistics, Columbia University, New York, NY 10027, USA.

Introduction

This book, the final in a series of stand-alone works, is a collection of invited papers that represent the current state of research in the field of Mathematical Finance, as seen by leading researchers in the field. Some of the contributed articles survey the existing results for a given topic, some discuss and present new research, some point out open problems and future directions, while many do all of the above. While effort was made to cover most of the important topics in the field, the book is not meant to be encyclopedic in nature. The outcome was ultimately influenced by the present scientific interest of the contributors and the editors. The primary audience are researchers in academia and industry who already have some basic knowledge of the field. This book might serve as a quick introduction to a specific topic, leading to recent results and open problems. It can also serve as valuable reference material. The first Part focuses on the theory and practice of pricing derivative securities. The paper “Arbitrage theory” by Y. Kabanov considers models where an investor, acting on a financial market with random price movements and having a given time horizon, subsequently transforms his initial endowment into a certain terminal wealth. In this framework, the author answers the following question: whether the investor has arbitrage opportunities, i.e. non-risky profits. The article examines and gives an answer to this question in different frameworks: one-step and multi-step models with finite space of possible states of the world, discrete-time models with infinite space of possible states of the world, continuous time models, semimartingale models, large financial markets and models with transaction costs. The article “Market models with frictions: arbitrage and pricing issues” by E. Jouini and C. Napp extends the previous results in two directions: first, they consider investment opportunities determined by their cash-flows instead of financial assets described by their price processes. This approach enables them to take into account classical market models as well as investment models. Second, the authors consider a wide range of possible market imperfections: transaction ix

x

Introduction

costs, borrowing costs and constraints, short-selling costs and constraints, fixed and proportional transaction costs and models with defaultable num´eraire. In all these cases, they characterize the no-arbitrage assumption through a unified approach and they apply these results to pricing and hedging issues. The contribution by J. Detemple “American options: symmetry properties” surveys generalizations of the classical put–call symmetry: the value of a put option with strike price K on an underlying asset S paying dividends at rate δ in a financial market with riskless interest rate r is the same as the value of a call option with strike price S on an asset paying dividends at rate r and having initial value K , in an auxiliary financial market with interest rate δ. It is shown that the symmetry holds in a large class of models, including nonmarkovian markets with random coefficients, and even for many nonstandard American claims including barrier options, multi-asset derivatives, and occupation time derivatives. The main tool, change of num´eraire technique, is also reviewed and extended to the case of dividend-paying assets. The put–call symmetry reduces the computational burden in pricing options; it provides useful insights into the economic relationship between contracts, and sometimes even helps to reduce the dimensionality of the problem, thereby making somewhat more tractable the difficult problem of evaluating American contingent claims. The article “Monte Carlo methods for security pricing” by P. Boyle, M. Broadie and P. Glasserman, reprinted from Journal of Economic Dynamics and Control, is a detailed survey of simulation methods applied to numerical pricing of European, and, more recently, American options. Since European option prices can be calculated as expected values, it is natural to use Monte Carlo for computing them. However, this can often be quite slow, and this paper reviews and compares different methods used to improve the efficiency of Monte Carlo methods. So-called “variance reduction” techniques are surveyed, including control variates, antithetic variates, moment matching, importance sampling and conditional Monte Carlo methods. Next, the quasi-Monte Carlo approach is reviewed, in which, instead of random numbers, deterministic sequences are generated – so-called quasi-random numbers or low-discrepancy sequences. These are more evenly dispersed than random sequences. It is interesting that these procedures are typically based on number-theoretic methods. The paper also discusses the use of Monte Carlo methods for computing sensitivities (“Greeks”) of the option price with respect to different parameters, and the difficult problem of computing American option prices using simulation. The difficulty stems from the fact that the price of an American option is a maximum of expected values, rather than a single expected value. In their chapter, R. Garcia and E. Renault use the concept of stochastic discount factor (SDF) or pricing kernel as a unifying principle to integrate two concepts

Introduction

xi

of latent variables, one cross-sectional, one longitudinal, in order to reduce the dimension of a statistical model specified for a multivariate time series of asset prices. In the CAPM or APT beta pricing models, the dimension reduction is cross-sectional in nature, while in time-series state-space models, dimension is reduced longitudinally by assuming conditional independence between consecutive returns given a small number of state variables. They provide this unifying analysis in the context of conditional equilibrium beta pricing as well as asset pricing with stochastic volatility, stochastic interest rates and other state variables. They address the general issue of econometric specifications of dynamic asset pricing models, which cover the modern literature on conditionally heteroskedastic factor models as well as equilibrium-based asset pricing models with an intertemporal specification of preferences and market fundamentals. D. Madan, in his contribution “Purely discontinuous asset price processes” surveys his work with various co-authors on modeling asset prices with pure jump processes, and on pricing contingent claims in such models. It is argued that statistical analysis leads to the consideration of discontinuous asset prices models, in which the arrival rate of jumps is infinite and decreasing in the jump size. Such models are also motivated by theoretical no-arbitrage considerations, implying that the prices must be modeled as time-changed Brownian motion. If, as is argued, this time change has to be modeled as random, we are led to the class of discontinuous price processes. Being of bounded variation, these prices are also more robust relative to change of parameters than the typical diffusion models. The example of the so-called variance gamma process is presented in detail, including solutions to option pricing and optimal investment problems in such a market model. Using these solutions, the model is calibrated, which is in turn used to infer trader preferences and personalized risk neutral measures, called position measures. The paper is representative of a very active field of research, rich in theoretical and practical implications. Part II presents different aspects of the theory and practice of interest rate modeling. Arbitrage-free movement of the forward curve is analyzed from the perspective of infinite dimensional diffusions by T. Bj¨ork in his article “A geometric view of interest rate theory”. He addresses the following questions: when is a given forward rate model consistent with a given family of forward rate curves and when can the inherently infinite dimensional forward rate process be realized by means of a finite-dimensional state space model? Necessary and sufficient conditions for consistency as well as for the existence of finite-dimensional realizations are given in terms of forward rate volatilities. That is, the forward rate model generated by a collection of volatility functions admits a finite dimensional realization if and only if the corresponding Lie algebra generated by the volatility functions and the

xii

Introduction

drift (which is also uniqely determined from the volatility functions by arbitrage considerations) is finite-dimensional in the neighbourhood of the initial condition. General consistency results are not given in this chapter, though references are made to the recent papers and the PhD thesis by D. Filipovic. Instead, the author concentrates on analysis of the Nelson–Siegel (NS) family of forward curves. It turns out that neither the Hull–White (HW) nor the Ho–Lee (HL) model is consistent with the NS family. In fact the NS manifold is too small for the HW and HL models, in the sense that if the initial curve is on the manifold, then the models will force the term structure off the manifold within an arbitrarily short period of time. The infinite-dimensional approach is also taken in the chapter: “Infinite dimensional diffusions, Kolmogorov equations and interest rate models” by B. Goldys and M. Musiela. The main emphasis is put on differential analysis in infinite dimension. Motivation comes from the need for a better understanding of interest rate risk management issues. To be more precise let us look first at the Black–Scholes model. The lognormal diffusion process generating arbitrage free evolution of the variable of interest can also be represented by corresponding it with an infinitesimal generator. Pricing of options is identical to solving the related Kolmogorov equation. Sensitivity to the change in the stochastic variable is done by simple differentiation of the price. The situation in the interest rate area is more complex. The underlying stochastic variable is the entire forward curve. The diffusion process defining the evolution of the forward curve is infinite-dimensional. The infinitesimal generator and the corresponding Kolmogorov equation need to be defined and studied from the perspective of the sensitivity of an interest rate option to the changes in the shape of the forward curve. It turns out that one can obtain Feynman–Kac representations of solutions to such equations for a large class of terminal conditions (which include most of the treated products) and that for those the price is differentiable with respect to the initial forward curve. This is in contrast with poor smoothing properties of the associated semigroup and the fact that not all the payoffs have discounted expected values which are Fr´echet differentiable. While continuous compounding associated with the continuous tenor models may ultimately lead to more unified infinite-dimensional theories of the forward curve dynamics, at the implementation level one is almost forced to work with models allowing for finite-dimensional realizations. On the other hand, simple compounding corresponding to a given discrete tenor structure has the advantage of being grounded on standard finite-dimensional semimartingale theory, which is better understood and more developed. Additionally, it represents the interest rate markets more realistically. As such, it is arguably better suited for the pricing of most Libor and swap derivatives. The canonical forward Libor and swap rate models with deterministic volatilities are by construction

Introduction

xiii

finite-dimensional diffusions under any of the Libor measures (spot or forward). The explicit relationships between the measures allow for the development of exact expressions or at least of good analytic approximations to a number of options such as caps and swaptions. The chapter: “Modelling of forward Libor and swap rates” by M. Rutkowski presents an overview of recently developed methodologies related to the derivation and analysis of the arbitrage free dynamics of such market rates. The article: “Towards a central interest rate model” by A. Brace, T. Dun and G. Barton aims to expose issues related with implementation of the canonical lognormal forward Libor model. The pricing of swaptions is examined within this framework and compared to the industry standard Black swaption formula, and, by extension, to the lognormal swap rate model. Swap and swaption behaviour are investigated under arbitrary volatility and yield curve specifications. Simulation and approximation techniques are used to make comparisons in terms of observed swap rate probability distributions, swaption volatilities and prices, and swaption sensitivities defined in terms of the swap rate. Fifteen swaptions and two volatility structures are considered. Swap rates simulated under the lognormal Libor model are shown to be statistically lognormal in each case, and volatilities, prices and Greeks agree closely. Finally, the approximate delta value within the lognormal Libor model is used in a simulated delta-hedging exercise and is seen to successfully hedge Libor model swaptions. This points to the robustness of the lognormal Libor model for the following two reasons. Firstly, the exact delta of a swaption, in a lognormal Libor model, is, in fact, the vector of partial derivatives of the swaption price with respect to the underlying forward Libor rates. Secondly, the volatility of the forward swap rate under the corresponding forward swap rate measure, in the lognormal Libor model, is stochastic. Overall, in the authors’ opinion, the forward Libor model is the unifying model capable of encompassing the properties of the swap rate model and allowing for greater aggregation of risk in portfolios containing Libor and swap derivatives. The third Part considers different types of risk in financial markets, and ways to manage and hedge exposure to risk. “Credit risk modelling: an intensity based approach” by T. Bielecki and M. Rutkowski reviews fundamental methodologies and results in the area of the intensity based default and credit risk modeling. Special care is devoted to the technical issues of the role of conditioning information in computations involving random times. The time of default is modeled via a jump process with positive jump intensity. An overview of credit-risk instruments is provided, together with market methods for pricing them. Next, the basic theory of valuation of defaultable claims is presented, and various specifications for modeling recovery value at or after the time of default are discussed. Moreover, models that account for the migration between credit-rating grades are surveyed, both in discrete-time and continuous-time. A credit-spread based HJM-type model

xiv

Introduction

is presented, in which default-free and defaultable term structure is modeled. Finally, the theory is applied to the problem of valuation of some common credit derivatives. The area of credit and default risk has been very active and popular in recent years, both in financial industry practice and in academic research. The primary purpose of the article: “Towards a theory of volatility trading” by P. Carr and D. Madan is to review three methods which recently emerged for trading realized volatility. The first method involves taking static position in options. The classic example is that of a log position in a straddle. The second method involves delta-hedging of an option. If an investor is successful in hedging away the price risk, then a prime determinant of the profit or loss from this strategy is the difference between the realized volatility and the anticipated volatility used in pricing and hedging the option. The final method reviewed for trading realized volatility involves buying or selling an over-the-counter contract whose payoff is an explicit function of volatility. The simplest example of such a volatility contract is a volatility swap. This contract pays the buyer the difference between the realized volatility and a level of volatility fixed at the outset of the contract. A secondary purpose is to uncover the link between volatility contracts and some recent ground-breaking work by Dupire and Derman, Kani, and Kamal. By restricting the set of times and price levels for which returns are used in volatility calculations, one can synthesize a contract which pays off the “local volatility”. The contribution by P. Glasserman, “Shortfall risk in long-term hedging with short-term futures contracts” proposes and analyzes a measure of the risk of a cash shortfall in hedging a risky position over time. The measure is illustrated by comparing various hedging strategies for firm hedging a long-term commitment with short-dated future contracts. It is motivated by the infamous case of derivatives losses suffered by Metallgesellschaft Refining and Marketing. The firm had entered into long-term contracts to supply oil at fixed prices, and was hedging these commitments with short-term future contracts. While the strategy would have produced, at least theoretically, a perfect hedge at the end of the long-term contract, it led to a severe cash shortfall during the life of the contract. In a Gaussian model the theory of Gaussian extremes and large deviation approximations are used to calculate this measure, to capture qualitative features of the shortfall risk and to identify the most likely path to a shortfall under different hedging strategies. A brief summary of concepts pertinent to futures and forwards is provided in an appendix. The theory for analyzing liquidity risks is only in its infancy, and this paper indicates some possible ways for making progress in developing it. M. Schweizer’s contribution “A guided tour through quadratic hedging approaches” gives an overview of the general theory of pricing and hedging contingent claims in incomplete markets by means of a quadratic criterion. It is

Introduction

xv

based on numerous papers by the author and his co-workers. It is an example of an abstract theory developed for very practical problems, since many models used in practice are, indeed, incomplete. The paper explains the notions of local risk-minimization, the minimal martingale measure, the variance-optimal martingale measure, mean-variance hedging, F¨ollmer–Schweizer decomposition, and so on. It first discusses the case in which the hedging strategies are not required to be self-financing. If the discounted price process is a local martingale, one can find a risk-minimizing strategy, which is also mean self-financing. In the general case, one can only find so-called locally risk-minimizing strategies. In the last part of the article, the mean-variance criterion is considered for those strategies that are required to be self-financing, and the connection to closedness properties of spaces of stochastic integrals is studied. Despite the significant progress that has been made on these problems over the years, and the success of complete characterization of solutions in special cases, in general, questions about how to actually construct optimal strategies remain open, and the search for those solutions is still ongoing. The companion chapter “Numerical comparison of local risk-minimization and mean-variance hedging” by D. Heath, E. Platen and M. Schweizer focuses on the more practical aspects of the two criteria. It begins with the concrete situation of a Markovian stochastic volatility setting and there provides general comparative results on prices, hedging strategies and risks for local risk-minimization versus mean-variance hedging. A detailed analysis including numerical results is then performed for the well-known Heston and Stein/Stein stochastic volatility models. The results highlight some important quantitative differences between the two approaches and give some directions for future research. Part IV contains papers on the optimal portfolio selection problem. The article “Theory of portfolio optimization in markets with frictions” by one of the editors (J.C.) surveys results on extending the classical Merton’s utility maximization problem in continuous-time models driven by Brownian motion, to the case of markets which are incomplete due to the presence of portfolio constraints, transaction costs, different borrowing and lending rates, and so on. The methodology employed is to first characterize the minimal cost of super-replicating a given claim in such markets, and then solve an optimization problem dual to the utility maximization problem. If the dual problem is appropriately defined, it can then be shown, using the results on super-replication, that the optimal strategy can be characterized in terms of the solution to the dual problem. Explicit results are available for many examples in the case of portfolio constraints and different borrowing and lending rates, but not in the case of transaction costs. In terms of open problems, as far as the general theory is concerned, some of these

xvi

Introduction

results have not yet been fully extended to general arbitrage-free semimartingale models. “Bayesian adaptive portfolio optimization” by I. Karatzas and X. Zhao also considers the portfolio optimization problem, but in the framework of the stock return rates being unobserved by the investor. Instead, they are modeled in a Bayesian fashion, as a random vector with a known probability distribution. The investor is assumed to observe past and present stock prices, and has to base investment decisions only on that information. The value function is obtained using both filtering/martingale and stochastic control/partial differential equation techniques. The former approach transforms the problem into one with the drift process adapted to the observation process, while the latter approach is used to show that the Hamilton–Jacobi–Bellman equation for this problem takes the form of a generalized Monge–Amp`ere equation, which is solved fairly explicitly. Next, it is shown that, for the logarithmic utility function, the cost of uncertainty about the unknown drift of the stock prices (relative to an investor who can observe the drift) is asymptotically negligible. The results are also extended to the case of portfolio constraints. The article is a contribution to the very lively line of research in financial economics and mathematics dealing with problems of incomplete or asymmetric information. The editors would like to express their gratitude to the individuals who made the book possible. Thanks are above all due to all the contributors – they have worked with us with enthusiasm and efficiency, making the editorial job truly enjoyable. The project would not have been possible without the immense efforts, support and vision of David Tranah of Cambridge University Press. We are sincerely grateful for his high professionalism and constant encouragement. We are also thankful to Elsevier, for permitting us to reprint the paper by Boyle, Broadie and Glasserman in this book. J.C., E.J. and M.M.

Part one Option Pricing: Theory and Practice

1 Arbitrage Theory Yu. M. Kabanov

1 Introduction We shall consider models where an investor, acting on a financial market with random price movements and having T as his time horizon, transforms the initial ξ endowment ξ into a certain resulting wealth; let RT denote the set of all final wealth corresponding to possible investment strategies. The natural question is, whether the investor has arbitrage opportunities, i.e. whether he can get non-risky profits. Let us “hide” in a “black box” the interior dynamics on the time-interval [0, T ] (i.e. the price process specification, market regulations, description of admissible strategies) and examine only the set RTξ . At this level of generality, the answer, as well as the hypotheses, should be ξ formulated only in terms of properties of the sets R T . E.g., in the simplest situation of frictionless market without constraints, R T0 is a linear subspace in the space L 0 of (scalar) random variables and RTξ = ξ + R T0 . The absence of arbitrage opportunities can be formalized by saying that the intersection of RT0 with the set L 0+ of non-negative random variables contains only zero. If the underlying probability space is finite, i.e. if we assume in our model only a finite number of states of the nature, it is easy to prove that there is no arbitrage if and only if there exists an equivalent “separating” probability measure with respect to which every element of RT0 has zero mean. Close look at this result shows that this assertion is nothing but the Stiemke lemma [62] of 1915 which is well-known in the theory of linear inequalities and linear programming as an example of the so-called alternative (or transposition) theorems, see historical comments in [61]; notice that the earliest alternative theorem due to Gordan [21] (of 1873) can be also interpreted as a no-arbitrage criterion. The one-step model can be generalized (or specialized, depending on the point of view) in many directions giving rise to what is called arbitrage theory. The reader should not be confused by using “general” and “special” in this context: obviously, 3

4

Yu. M. Kabanov

one-step models are particular cases of N -period models, but quite often the main difficulties in the analysis of models with a detailed (“specialized”) structure of the “black box” consist in verifying hypotheses of theorems corresponding to the one-step case. The geometric essence of these results is a separation of convex sets with a subsequent identification of the separating functional as a probability measure; the properties of the latter in connection with the price process are of particular interest. To this date one can find in the literature dozens of models of financial markets together with a plethora of definitions of arbitrage opportunities. These models can be classified using the following scheme.

1.1 Finite probability space Assuming only a finite number of states of the nature is popular in the literature on economics. Of course, the hypothesis is not adequate to the basic paradigm of stochastic modeling because random variables with continuous distributions cannot “live” on finite probability spaces. The advantage of working under this assumption is that a very restricted set of mathematical tools (basically, elementary finite-dimensional geometry) is required. Results obtained in this simplified setting have an important educational value and quite often may serve as the starting point for a deeper development.

1.2 General probability space In contrast to the case of finite probability space, the straightforward separation arguments, which are the main instruments to obtain no-arbitrage criteria, fail to be applied without further topological assumptions on RT0 . In many particular cases, especially in the theory of continuous trading, they are not fulfilled. This circumstance led Kreps (1981) to a more sophisticated “no-arbitrage” concept, namely, that of “no free lunch” (NFL). However, certain no-arbitrage criteria are of the same form as for the models with finite probability space . 1.3 Discrete-time multi-period models Even for the case of finite probability space , these models are important because they allow us to describe the intertemporal behavior of investors in financial markets, i.e. to penetrate into the structure of the “black box” using concepts of random processes. One of the most interesting features is that in the simplest model without constraints the value processes of the investor’s portfolios are martingales with respect to separating measures and the same property holds for the underlying

1. Arbitrage Theory

5

price process; this explains the terminology “equivalent martingale measures”. Models based on the infinite posed challenging mathematical questions, e.g., whether the absence of arbitrage is still equivalent to the existence of equivalent martingale measure. For a frictionless market the affirmative answer has been given by Dalang, Morton, and Willinger (1990). Their work, together with the earlier paper of Kreps, stimulated further research in geometric functional analysis and stochastic calculus, involving rather advanced mathematics.

1.4 Continuous trading Although the continuous-time stochastic processes were used for modeling from the very beginning of mathematical finance (one can say that they were even invented exactly for this purpose, having in mind the Bachelier thesis “Th´eorie de la sp´eculation” where Brownian motion appeared for the first time), their “golden age” began in 1973 when the famous Black–Scholes formula was published. Subsequent studies revealed the role of the uniqueness of the equivalent martingale measure for pricing of derivative securities via replication. The importance of no-arbitrage criteria seems to be overestimated in financial literature: the unfortunate alias FTAP – Fundamental Theorem of Asset (or Arbitrage) Pricing, ambitious and misleading, is still widely used. If there are many equivalent martingale measures, the idea of “pricing by replication” fails: a contingent claim may not belong to RTx whatever x is, or may belong to many RTx . In the latter case it is not clear which martingale measure can be used for pricing and this is the central problem of current studies on incomplete markets. However, as to mathematics, the no-arbitrage criteria for general semimartingale models are considered among the top achievements of the theory. In 1980 Harrison and Pliska noticed that stochastic calculus, i.e. the integration theory for semimartingales, developed by P.-A. Meyer in a purely abstract way, is “tailor-made” for financial modeling. In 1994 Delbaen and Schachermayer confirmed this conclusion by proving that the absence of arbitrage in the class of elementary, “practically admissible” strategies implies the semimartingale property of the price process. In a series of papers they provided a profound analysis of the various concepts culminating in a result that the Kreps NFL condition (equivalent to a whole series of properties with easier economic interpretation) holds if and only if the price process is a σ -martingale under some P˜ ∼ P. There is another justification of the increasing interest in semimartingales in financial modeling: mathematical statistics sends alarming signals that in many cases empirical data for financial time series are not compatible with the hypothesis that they are generated by processes with continuous sample paths. Thus, diffusions should be viewed

6

Yu. M. Kabanov

only as strongly stylized models of financial data; it has been revealed that L´evy processes give much better fit.

1.5 Large financial markets This particular group, including the so-called Arbitrage Pricing Model (or Theory), abbreviated to APM (or APT), due to Ross and Huberman (for the one-period case), has the following specific feature. In contrast with the conventional approach of describing a security market by a single probabilistic model, a sequence of stochastic bases with an increasing but always finite number of assets is considered. One can think that the agent wants to concentrate his activity on smaller portfolios because of his physical limitations but larger portfolios in this market may have better performance. The arbitrage is understood in an asymptotic sense. Its absence implies relationships between model parameters which can be verified empirically. This circumstance makes such models especially attractive. The weak side of APM is the use of the quadratic risk measure. This means that gains are punished together with losses in symmetric ways which is unrealistic. Luckily, the conclusion of APM, the Ross–Huberman boundedness condition, seems to be sufficiently “robust” with respect to the risk measure and the variation of certain model parameters. In the recent papers [36] and [37], where the theory of large financial markets was extended to the general semimartingale framework, the concept of asymptotic arbitrage is developed for an “absolutely” risk-averse agent. In spite of a completely different approach, the absence of asymptotic arbitrage implies, for various particular models, relations similar to the Ross–Huberman condition.

1.6 Models with transaction costs In the majority of models discussed in mathematical finance, the investor’s wealth is scalar, i.e. all positions are measured in units of a single asset (money, bond, bank account, etc.). However, in certain cases, e.g., in models with constraints and, especially, in those taking transaction costs into account, it is quite natural to consider, as the primary object, the whole vector-valued process of current positions, either in physical quantities or in units of values measured by a certain num´eraire. It happens that this approach allows not only for a more detailed and realistic description of the portfolio dynamics but also opens new perspectives for further mathematical development, in particular, for an extensive use of ideas from theory of partially ordered spaces, utility theory, optimal control, and mathematical economics. Until now only a few results are available in this new branch of arbitrage theory. Recent studies [34] and [41] show that the basic concept of

1. Arbitrage Theory

7

arbitrage theory, that of the equivalent martingale measure, should be modified and generalized in an appropriate way. There are various approaches to the problem which will be discussed here. Notice that models with transaction costs quite often were considered as completely different from those of a frictionless market and the classical results could not be obtained as corollaries when transaction costs vanish. The modern trend in the theory is to work in the framework which covers the latter as a special case. Arbitrage theory includes another, even more important subject, namely, hedging theorems, closely related with the no-arbitrage criteria. These results, discussed in the present survey in a sketchy way, give answers to whether a contingent claim can be replicated in an appropriate sense by a terminal value of a self-financing portfolio or whether a given initial endowment is sufficient to start a portfolio replicating the contingent claim. Other related problems such as market completeness or models with continuum securities, arising in the theory of bond markets, are not touched here. The books [52], [57], and [29] may serve as references in convex analysis, probability, and stochastic calculus.

2 Discrete-time models 2.1 General setting Let (, F, F = (Ft ), P) be a stochastic basis (i.e. filtered probability space), t = 0, 1, . . . , T . We assume that each σ -algebra Ft is complete. We are given: • convex cones Rt0 ⊆ L 0 (Rd , Ft ); • closed convex cones Kt ⊆ L 0 (Rd , Ft ). The notation L 0 (K t , Ft ) is used for the set of all Ft -measurable random variables with values in the set K t (or Ft -measurable selectors of K t if K t depends on ω). The usual financial interpretation: Rt0 is the set of portfolio values at the date t corresponding to the zero initial endowment, i.e. all imaginable results that can be obtained by the investor to the date t. The cones Kt induce the partial orderings in the sets L 0 (Rd , Ft ): ξ ≥t η

⇔

ξ − η ∈ Kt .

The partial orderings ≥t allow us to compare current results. As a rule, they are obtained by “lifting” partial orderings from Rd to the space of random variables.

8

Yu. M. Kabanov

A typical example: Kt = L 0 (K , Ft ) where K is a closed cone in Rd (which may depend on ω and t). In particular, the “standard” ordering ≥t is induced by K t = Rd+ when ξ ≥t η if ξ i ≥ ηi (a.s.) for all i ≤ d; for the case d = 1 it is the usual linear ordering of the real line. However, we do not exclude other partial orderings. In the theory of frictionless market, usually, d = 1; for models with transaction costs d is the number of assets in the portfolio. We define also the set A0T := RT0 − KT . The elements of A0T are interpreted as contingent claims which can be hedged (or super-replicated) by the terminal values of portfolios starting from zero. The linear space LT := KT ∩ (−KT ) describes the positions ξ such that ξ ≥T 0 and ξ ≤T 0, which are “financially equivalent to zero”. The comparison of results can be done modulo this equivalence, i.e. in the quotient space L 0 /LT equipped with the ordering induced by the proper cone K˜ T := π T KT where π T : L 0 → L 0 /LT is the natural projection. 2.2 No-arbitrage criteria for finite The most intuitive formulation of the property that the market has no arbitrage opportunities for the investors without initial capital is the following: NA. KT ∩ RT0 ⊆ LT . In the particular case when KT is a proper cone we have NA . KT ∩ RT0 ⊆ {0} (with equality if RT0 is closed). The first no-arbitrage criterion has the following form. Theorem 2.1 Let be finite. Assume that RT0 is closed. Then NA holds if and only if there exists η ∈ L 0 (Rd , FT ) such that Eηζ > 0

∀ζ ∈ KT \ LT

and Eηζ ≤ 0

∀ζ ∈ RT0 .

Because L 0 is a finite-dimensional space, this result is a reformulation of Theorem A.2 on separation of convex cones. It is easy to verify that KT ∩ RT0 ⊆ LT if and only if KT ∩ A0T ⊆ LT . Hence, in this theorem one can replace RT0 by A0T . The above criterion can be classified as a result for the one-step model where T stands for “terminal”. It has important corollaries for multi-period models where the sets RT0 have a particular structure.

1. Arbitrage Theory

9

3 Multi-step models 3.1 Notations For X = (X t )t≥0 and Y = (Yt )t≥0 we define X − := (X t−1 ) (various conventions for X −1 can be used), X t := X t − X t−1 , and, at last, X · Yt :=

t

X k Yk ,

k=0

for the discrete-time integral. Here X and Y can be scalar or vector-valued. In the latter case sometimes we shall use the abbreviation X • Y for the vector process formed by the pairwise integrals of the components X • Y := (X 1 · Y 1 , . . . , X d · Y d ). Though in the discrete-time case the dynamics can be expressed exclusively in terms of differences, “integral” formulae are often instructive for continuous-time extensions. For finite , if X is a predictable process (i.e. X t is Ft−1 -measurable) and Y belongs to the space M of martingales, then X · Y is also a martingale. The product formula (X Y ) = X Y + Y− X is obvious.

3.2 Example 1. Model of frictionless market The model being classical, we do not give details and financial interpretations: they are widely available in many textbooks. Let S = (St ), t = 0, 1, . . . , T , be a fixed n-dimensional process adapted to a discrete-time filtration F = (Ft ). Here T is a finite integer and, for simplicity, the σ -algebra F0 assumed to be trivial. The convention S−1 = S0 is used. Define RT0 as the linear space of all scalar random variables of the form N · ST where N is an n-dimensional predictable process. For x ∈ R we put RTx = x + RT0 . We take K0 := R+ and KT := L 0 (R+ , FT ). The components S i describe the price evolution of n risky securities, N i is the portfolio strategy which is self-financing, and V is the value process. In this specification it is tacitly assumed that there is a traded asset with the constant unit price, i.e. this asset is the num´eraire. Remark 3.1 One should take care that there is another specification where the num´eraire is not necessarily a traded asset. A possible confusion may arise because

10

Yu. M. Kabanov

the formula for the value process looks similar but the integrand and the integrator are in the latter case d-dimensional processes with d = n + 1. The increments of a self-financing portfolio strategy are explicitly constrained by the relation St−1 Nt = 0. If the num´eraire (“cash” or “bond”) is traded, the integral with respect to the latter vanishes but, of course, holdings in “cash” are not arbitrary but defined from the above relation. For finite we have, in virtue of Theorem 2.1, that the model has no-arbitrage if and only if there is a strictly positive random variable η such that Eηζ = 0 for all ζ ∈ R T0 . Without loss of generality we may assume that Eη = 1 and define the ˜ = 0 for all ζ ∈ RT0 (i.e. E˜ N · ST = 0 probability measure P˜ = η P. Clearly, Eζ for all predictable N ) if and only if S is a martingale. With this remark we get the Harrison–Pliska theorem: Theorem 3.2 Assume that is finite. Then the following conditions are equivalent: (a) R T0 ∩ L 0 (R+ , FT ) = {0} (no-arbitrage); ˜ (b) there exists a measure P˜ ∼ P such that S ∈ M( P). Let ρ t := d P˜t /d Pt be the density corresponding to the restrictions of P˜ and P to Ft . Recall that the density process ρ = (ρ t ) is a martingale ρ t = E(ρ T |Ft ). Since ˜ ⇐⇒ Sρ ∈ M(P), S ∈ M( P) we can add to the conditions of the above theorem the following one: (b ) there is a strictly positive martingale ρ such that ρ S ∈ M. Notice that the equivalence of (b) and (b ) is a general fact which holds for arbitrary and even in the continuous-time setting. Though the property (b ) can be considered simply as a reformulation of (b), it is more adapted to various extensions. The advantage of (b) is in the interpretation of P˜ as a “risk-neutral” probability.

3.3 Example 2. Model with transaction costs Now we describe a discrete-time version of a multi-currency model with proportional transaction costs introduced in [34] and studied in the papers [11] and [41]. It is assumed that the components of an adapted process S = (St1 , . . . , Std ), t = 0, 1, . . . , T , describing the dynamics of prices of certain assets, e.g., currencies quoted in a certain reference asset (say, “euro”), are strictly positive. It is

1. Arbitrage Theory

11

convenient to choose the scales to have S0i = 1 for all i. We do not suppose that the num´eraire is a traded security. The transaction costs coefficients are given by an adapted process = (λi j ) taking values in the set Md+ of non-negative d × d-matrices with zero diagonal. The agent’s portfolio at time t can be described either by a vector of “physical” t = (V t1 , . . . , V td ) or by a vector V = (Vt1 , . . . , Vtd ) of values invested quantities V in each asset. The relation i = V i /S i , V t t t

i ≤ d,

is obvious. Introducing the diagonal operator φ t (ω) : (x 1 , . . . , x d ) → (x 1 /St1 (ω), . . . , x d /Std (ω)).

(1)

we may write that t = φ t Vt . V The increments of portfolio values are ti Sti + bti Vti = V

(2)

with bti =

d

ji

αt −

j=1

d ij (1 + λi j )α t , j=1

ji

where α t ∈ L 0 (R+ , Ft ) represents the net amount transferred from the position j to the position i at the date t. The first term in the right-hand side of (2) is due to the price increment while the second corresponds to the agent’s actions (made after the revealing of new prices). Notice that these actions are charged by the amount −

d i=1

bti

=

d d

ij

λi j α t

i=1 j=1

diminishing the total portfolio value. With every Md+ -valued process (α t ) and any initial endowment v = V−1 ∈ Rd we associate, using recursively the formula (2), a value process V = (Vt ), t = 0, . . . , T . The terminal values of these processes form the set RTv . Remark 3.3 In the literature one can find other specifications for transaction costs coefficients. To explain the situation, let us define α˜ i j := (1 + λi j )αi j . The

12

Yu. M. Kabanov

increment of value of the i-th position can be written as b = i

d

µ

ji

α˜ tji

−

j=1

d

α˜ it j ,

j=1

where µ := 1/(1 + λ ) ∈ ]0, 1]. The matrix (µi j ) can be specified as the matrix of the transaction costs coefficients. In models with a traded num´eraire, i.e. a non-risky asset, a mixture of both specifications is used quite often. ji

ji

Before analyzing the model, we write it in a more convenient way reducing the dimension of the action space. To this aim we define, for every (ω, t), the convex cone d ij Mt (ω) := x ∈ Rd : ∃ a ∈ Md+ such that x i = [(1+λt (ω))a i j −a ji ], i ≤ d , i=1

which is a polyhedral one as it is the image of the polyhedral cone Md+ under a linear mapping. Its dual positive cone Mt∗ (ω) := w ∈ Rd : inf wx ≥ 0 x∈Mt (ω)

can be easily described by linear homogeneous inequalities. Specifically, Mt∗ (ω) = {w ∈ Rd : w j − (1 + λt (ω))wi ≤ 0, 1 ≤ i, j ≤ d}. ij

We introduce also the solvency cone (in values) d ij K t (ω) := x ∈ Rd : ∃ a ∈ Md+ such that x i + [a ji − (1 + λt (ω))a i j ] ≥ 0, i=1

i ≤d ,

i.e. K t (ω) = Mt (ω) + Rd+ . The negative holdings of a position vector in K t (ω) can ij be liquidated (under transaction costs given by (λt (ω)) to get a position vector in Rd+ . Let B be the set of all processes B = (Bt ) with Bt ∈ L 0 (−Mt , Ft ). It is an easy exercise on measurable selection to check that Bt can be represented using a certain Ft -measurable transfer matrix α t . Thus, the set of portfolio process in the “value domain” coincides with the set of processes V = V v,B , B ∈ B, given by the system of linear difference equations i Vti = Vt−1 Yti + Bti ,

i V−1 = vi ,

(3)

with Yti =

Sti , i St−1

Y0i = 1.

(4)

1. Arbitrage Theory

13

Remark 3.4 Using the notations introduced at the beginning of this section, we can rewrite these equations in the integral form V = v + V− • Y + B,

(5)

Y i = 1 + (1/S−i ) · S i ,

(6)

with

which remains the same also for the continuous-time version but with a different meaning of the symbols, see [34], [39]. It is easier to study no-arbitrage properties of the model working in the “physical domain” where portfolio evolves only because of the agent’s action. Indeed, the is simpler: dynamics of V B i Vti = i t . St This equation is obvious because of its financial interpretation but one can check it formally (e.g., using the product formula). t (ω) := φ t (ω)Mt (ω) and introduce the solvency cone (in physical units) Put M t (ω) := φ t K t (ω) = M t (ω) + Rd . K t , Ft ), 0 ≤ t ≤ T , defines a portfolio process V Every process b with bt ∈ L 0 (− M with V = b and the zero initial endowment. All portfolio processes (in physical units) can be obtained in this way. 0 are obvious. The notations RT0 and R T Lemma 3.5 The following conditions are equivalent: (a) RT0 ∩ L 0 (K T , FT ) ⊆ L 0 (∂ K T , FT ); (b) RT0 ∩ L 0 (Rd+ , FT ) = {0}; 0 ∩ L 0 (Rd+ , FT ) = {0}. (c) R T Proof The equivalence of (b) and (c) is obvious. The implication (a) ⇒ (b) holds because Rd+ \ {0} is a subset of int K T . To prove the remaining implication (b) ⇒ (a) we notice that if VTB ∈ L 0 (K T , FT ) where B ∈ B then there exists / ∂ K T (ω). B ∈ B such that VTB ∈ L 0 (Rd+ , FT ) and VTB (ω) = 0 on the set VTB (ω) ∈ To construct such B , it is sufficient to modify only BT by combining the last transfer with the liquidation of the negative positions. In accordance with [41] we shall say that the market has weak no-arbitrage property at the date T (NAwT ) if one of the equivalent conditions of the above lemma is fulfilled. Apparently, NAwT implies NAw t for all t ≤ T .

14

Yu. M. Kabanov

0 ∩ L 0 (Rd , FT ) = {0} if and only if Lemma 3.6 Assume that is finite. Then R + T there exists a d-dimensional martingale Z with strictly positive components such ∗ , Ft ). that Z t ∈ L 0 ( M 0 is polyhedral. In virtue of Theorem 2.1 the first condition Proof The cone R T is equivalent to the existence of a strictly positive random variable η such that 0 . Let Z t = E(η|Ft ). Since L 0 (− M t , FT ) ⊆ R 0 , the Eηζ ≤ 0 for all ζ ∈ R T T t , Ft ) implying that Z t ∈ L 0 ( M t∗ , Ft ). inequality E Z t ζ ≥ 0 holds for all ζ ∈ L 0 ( M If the second condition of the lemma is fulfilled, we can take η = Z T . Let DT be the set of martingales Z = (Z t ) such that Z t ∈ L 0 (K t∗ , Ft ). The following result from [41] is a simple corollary of the above criteria: Theorem 3.7 Assume that is finite. Then NAwT holds if and only if there exists a process Z ∈ D with strictly positive components. This result contains the Harrison–Pliska theorem. Indeed, in the case where all λi j = 0, the cone K = K˜ := {x ∈ Rd : x1 ≥ 0} and K ∗ = R+ 1. Thus, for Z ∈ D all components of the process Z are equal. If, e.g., the first asset is the num´eraire, then Z 1 = Z 1 is a martingale as well as the processes S i Z 1 , i = 2, . . . , d, i.e. Z 1 is a martingale density. Remark 3.8 For models with transaction costs other types of arbitrage may be of interest. E.g., it is quite natural to consider the ordering induced by the cone K˜ := {x ∈ Rd : x1 ≥ 0} (corresponding to the absence of transaction costs), see a criterion in [41] which can be obtained along the same lines as above. Remark 3.9 It is easily seen that d ij t (ω) := y ∈ Rd : ∃ c ∈ Md+ such that y i = [π t (ω)ci j − c ji ], i ≤ d , (7) M j=1

where ij

ij

j

π t := (1 + λt )St /Sti ,

1 ≤ i, j ≤ d.

(8) ij

One can start the modeling by specifying instead of the process (λt ) the process ij (π t ) with values in the set of non-negative matrices with units on the diagonal. t , Ft ) and the set of with V t ∈ L 0 (− M Defining directly the set of processes V 0 “results” RT , one can get Lemma 3.6 immediately. The advantage of this approach is that the existence of the reference asset (i.e. of the price process S) is not assumed and we have a model of “pure exchange”. A question arises when such a model can be reduced to a transaction costs model with a reference asset, i.e. under what

1. Arbitrage Theory

15

conditions on the matrix (π i j ) one can find a matrix (λi j ) with positive entries and a vector S with strictly positive entries satisfying the relation (8).

3.4 The Dalang–Morton–Willinger theorem Let us consider again the classical model of a frictionless market but now without any assumption on the stochastic basis. Theorem 3.10 The following conditions are equivalent: RT0 ∩ L 0 (R+ , FT ) = {0} (no-arbitrage); A0T ∩ L 0 (R+ , FT ) = {0}; A0T ∩ L 0 (R+ , FT ) = {0} and A0T = A¯ 0T , the closure in L 0 ; A¯ 0T ∩ L 0 (R+ , FT ) = {0}; for every probability measure P ∼ P there is a measure P˜ ∼ P such that ˜ ˜ P ≤ const and S ∈ M( P); d P/d ˜ ( f ) there is a probability measure P˜ ∼ P such that S ∈ M( P). ˜ (g) there is a probability measure P˜ ∼ P such that S ∈ Mloc ( P). (a) (b) (c) (d) (e)

It seems that these equivalent conditions (among many others) are the most essential ones to be collected in a single theorem. The equivalence of (a), (e), and ( f ) relating a “financial property” of absence of arbitrage with important “probabilistic” properties is due to Dalang, Morton, and Willinger [8]. Their approach is based on a reduction to a one-stage problem which is very simple for the case of trivial initial σ -algebra; regular conditional distributions and measurable selection theorem allows us to extend the arguments to treat the general case, see [53], [29], and [58] for other implementations of the same idea. Formally, the equivalence (a) ⇔ ( f ) is exactly the same as the Harrison–Pliska theorem and one could think that it is just the same result under the relaxed hypothesis on . In fact, such a conclusion seems to be superficial: the equivalent “functional-analytic property” (c), discovered by Schachermayer in [56] , shows clearly the profound difference between these two situations. Schachermayer’s condition opens the door to an extensive use of geometric functional analysis in the discrete-time setting which was reserved previously only for continuous-time models. It is quite interesting to notice that the set RT0 is always closed while A0T is not. The condition (d) introduced by Stricker in [60] also gives a hint on an appropriate use of separation arguments. Specifically, the Kreps–Yan theorem (see the Appendix) can be applied to separate A0T ∩ L 1 (P ) from L 1+ (P ) = L 1 (R+ , P ) where the measure P ∼ P can be chosen arbitrarily: this freedom allows us to obtain an “equivalent separating measure” with a desired property.

16

Yu. M. Kabanov

Notice that the crucial implication (b) ⇒ (d) seems to be easier to prove than (a) ⇒ (c), see [36] where a kind of “linear algebra” with random coefficients was suggested. The literature provides a variety of other equivalent conditions complementing the list of the above theorem. Some of them are interesting and non-trivial. A family of conditions is related with various classes of admissible strategies B (which is the set of all predictable process in our formulation). Since the sets RT0 and A0T depend on this class, so does the no-arbitrage property. It happens, however, that the latter is quite “robust”: e.g., it remains the same if we consider as admissible only the strategies with non-negative value processes. The problem of admissibility is not of great importance since we assume a finite time horizon. The situation is radically different for continuous-time models where one must work out the doubling strategies which allow us to win even betting on a martingale. Proof of Theorem 3.10 The implications (a) ⇒ (b) and (c) ⇒ (d) are obvious as well as the chain (e) ⇒ ( f ) ⇒ (g). To prove the implication (d) ⇒ (e) we observe that the two properties are invariant under the equivalent change of measure. Thus, we may assume that P = P and, moreover, by passing to the measure ce−η P with η = supt≤T |St |, that all St are integrable. The set A¯ 10 ∩ L 1 is closed in L 1 and intersects with L 1+ ˜ P ∈ L ∞ such only at zero. By the Kreps–Yan theorem there is a P˜ with d P/d 1 1 ˜ ≤ 0 for all ξ ∈ A¯ 0 ∩ L . Taking ξ = ±Ht St where Ht is bounded and that Eξ Ft−1 -measurable, we conclude that S is a martingale. The implication (g) ⇒ (a) is also easy. If H · St ≥ 0 for all t ≤ T , then, ˜ ˜ by the Fatou lemma, the local P-martingale H · S is a P-supermartingale and, therefore, E˜ H · ST ≤ 0, i.e. H · ST = 0. In other words, there is no arbitrage in the class of strategies with non-negative value processes. This implies (a) since for any arbitrage opportunity H there is an arbitrage opportunity H with non-negative value process. Indeed, if P(H · Ss ≤ −b) > 0 for some s < T and b > 0, then one can take H = I]s,T ]×{H ·Ss ≤−b} H . In the proof of the “difficult” implication (b) ⇒ (c) we follow [42]. Lemma 3.11 Let ηn ∈ L 0 (Rd ) be such that η := lim inf |ηn | < ∞. Then there are η˜ k ∈ L 0 (Rd ) such that for all ω the sequence of η˜ k (ω) is a convergent subsequence of the sequence of ηn (ω). Proof Let τ 0 := 0 and τ k := inf{n > τ k−1 : ||ηn | − η| ≤ 1/k}. Then η˜ k0 := ητ k is in L 0 (Rd ) and supk |η˜ k0 | < ∞. Working further with the sequence of η˜ n0 we construct, applying the above procedure to the first component, a sequence of η˜ k1 with the convergent first component and such that for all ω the sequence of η˜ k1 (ω) is

1. Arbitrage Theory

17

a subsequence of the sequence of η˜ n0 (ω). Passing on each step to the newly created sequence of random variables and to the next component we arrive at a sequence with the desired properties. To show that A0T is closed we proceed by induction. Let T = 1. Suppose that H1n S1 − r n → ζ a.s., where H1n is F0 -measurable and r n ∈ L 0+ . It is sufficient to find F0 -measurable random variables H˜ 1k convergent a.s. and r˜ k ∈ L 0+ such that H˜ 1k S1 − r˜ k → ζ a.s. Let i ∈ F0 form a finite partition of . Obviously, we may argue on each i separately as on an autonomous measure space (considering the restrictions of random variables and traces of σ -algebras). Let H 1 := lim inf |H1n |. On 1 := {H 1 < ∞} we take, using Lemma 3.11, F0 -measurable H˜ 1k such that H˜ 1k (ω) is a convergent subsequence of H1n (ω) for every ω; r˜ k are defined correspondingly. Thus, if 1 is of full measure, the goal is achieved. On 2 := {H 1 = ∞} we put G n1 := H1n /|H1n | and h n1 := r1n /|H1n | and observe that G n1 S1 − h n1 → 0 a.s. By Lemma 3.11 we find F0 -measurable G˜ k1 such that G˜ k1 (ω) is a convergent subsequence of G n1 (ω) for every ω. Denoting the limit by G˜ 1 , we obtain that G˜ 1 S1 = h˜ 1 where h˜ 1 is non-negative, hence, in virtue of (b), G˜ 1 S1 = 0. As G˜ 1 (ω) = 0, there exists a partition of 2 into d disjoint subsets i2 ∈ F0 such that G˜ i1 = 0 on i2 . Define H¯ 1n := H1n − β n G˜ 1 where β n := H1ni /G˜ i1 on i2 . Then H¯ 1n S1 = H1n S1 on 2 . We repeat the procedure on each i2 with the sequence H¯ 1n knowing that H¯ 1ni = 0 for all n. Apparently, after a finite number of steps we construct the desired sequence. T Let the claim be true for T −1 and let t=1 Htn St −r n → ζ a.s., where Htn are n 0 Ft−1 -measurable and r ∈ L + . By the same arguments based on the elimination of non-zero components of the sequence H1n and using the induction hypothesis we replace Htn and r n by H˜ tk and r˜ k such that H˜ 1k converges a.s. This means that the problem is reduced to the one with T − 1 steps.

4 No-arbitrage criteria in continuous time Nowadays, in the era of electronic trading, there are no doubts that continuous-time models are much more important than their discrete-time relatives. As a theoretical tool, differential equations (eventually, stochastic) show enormous advantage with respect to difference equations. Easy to analyze, they provide very precise description of various phenomena and, quite often, allow for tractable closed-form solutions. As we mentioned already, the mathematical finance started from a continuous-time model. The unprecedented success of the Black–Scholes formula

18

Yu. M. Kabanov

confirmed that such models are adequate tools to describe financial market phenomena. The current trend is to go beyond the Black–Scholes world. Statistical tests for financial data reject the hypothesis that prices evolve as processes with continuous sample paths. Much better approximation can be obtained by stable or other types of L´evy processes. Apparently, semimartingales provide a natural framework for discussion of general concepts of financial theory like arbitrage and hedging problems. Though more general processes are also tried, yet a very weak form of absence of arbitrage (namely, the NFLVR-property for simple integrands) in the case of a locally bounded price process implies that it is a semimartingale, see Theorem 7.2 in [12].

4.1 No Free Lunch and separating measure In this subsection we explain relations between the No Free Lunch (NFL) condition due to Kreps, No Free Lunch with Bounded Risk (NFLBR) due to Delbaen, and No Free Lunch with Vanishing Risk (NFLVR) introduced by Delbaen and Schachermayer (see, [48], [10], [12]). Let us assume that in a one-step model of frictionless market admissible strategies are such that the convex cone RT0 (the set of final portfolio values corresponding to zero initial endowment) contains only (scalar) random variables bounded from below. As usual, let A0T := RT0 − L 0 (R+ ). Define the set C := A0T ∩ L ∞ . ¯ C˜ ∗ , and C¯ ∗ the norm closure, the union of weak∗ closures of We denote by C, denumerable subsets, and the weak∗ closure of C in L ∞ ; C+ := C ∩ L ∞ + etc. The properties NA, NFLVR, NFLBR, and NFL mean that C + = {0}, C¯ + = {0}, C˜ +∗ = {0}, and C¯ +∗ = {0}, respectively. Consecutive inclusions induce the hierarchy of these properties: ⊆ C¯ ∗ C ⊆ C¯ ⊆ C˜ ∗ NA ⇐ NFLVR ⇐ NFLBR ⇐ NFL. Define the ESM (Equivalent Separating Measure) property as follows: there ˜ ≤ 0 for all ξ ∈ RT0 . exists P˜ ∼ P such that Eξ The following criterion for the N F L-property was established by Kreps. Theorem 4.1 NFL ⇔ ESM. 1 ˜ Proof (⇐) Let ξ ∈ C¯ ∗ ∩ L ∞ + . Since d P/d P ∈ L , there are ξ n ∈ C with n n 0 ˜ ˜ ˜ n ≤ 0 implying that Eξ n → Eξ . By definition, ξ n ≤ ζ where ζ ∈ RT . Thus, Eξ ˜ ≤ 0 and ξ = 0. Eξ (⇒) Since C¯ ∗ ∩ L ∞ + = {0}, the Kreps–Yan separation theorem given in the

1. Arbitrage Theory

19

˜ ≤ 0 for all ξ ∈ C, hence, for all ξ ∈ RT0 . Appendix provides P˜ ∼ P such that Eξ

4.2 Semimartingale model Let (, F, F = (Ft ), P) be a stochastic basis, i.e. a probability space equipped with a filtration F satisfying the “usual conditions”. Assume for simplicity that the initial σ -algebra is trivial, the time horizon T is finite, and FT = F. A process X = (X t )t∈[0,T ] (right-continuous and with left limits) is a semimartingale if it can be represented as a sum of a local martingale and a process of bounded variation. Let U1 be the set of all predictable processes h taking values in the interval [−1, 1]. We denote by h · S the stochastic integral of a predictable process h with respect to a semimartingale. The definition of this integral in its full generality, especially for vector processes (necessary for financial application), is rather complicated and we send the reader to textbooks on stochastic calculus. The linear space S of semimartingales starting from zero is a Fr´echet space with the quasinorm D(X ) := sup E(1 ∧ |h · X T |) h∈U1

´ which induces the Emery topology, [17]. We fix in S a closed convex subset X 1 of processes X ≥ −1 which contains 0 and satisfies the following condition: for any X, Y ∈ X 1 and for any non-negative bounded predictable processes H, G with H G = 0 the process Z := H · X + G · Y belongs to X 1 if Z ≥ −1. Put X := cone X 1 . The set X is interpreted as the set of value processes. Put RT0 := {X T : X ∈ X }. In this rather general semimartingale model we have NFLVR ⇔ NFLBR ⇔ NFL in virtue of the following: Theorem 4.2 Under NFLVR C = C¯ ∗ . The proof of this theorem given in [34] follows closely the arguments of the Delbaen–Schachermayer paper [12]. Their setting is based on a n-dimensional price process S, the admissible strategies H are predictable Rn -valued processes for which stochastic integrals H · S are defined and bounded from below. The set X 1 of all value process H · S ≥ −1 is closed in virtue of the M´emin theorem on closedness in S of the space of stochastic integrals [50]. If S is bounded then the process H = ξ I]s,t] is admissible for arbitrary ξ ∈ L ∞ (Rn , Ft ), and ˜ (St − Ss ) ≤ 0 for any separating measure P. ˜ In fact, there is equality hence Eξ

20

Yu. M. Kabanov

here because one can change the sign of ξ . Thus, if S is bounded then it is a ˜ It is an easy exercise to martingale with respect to any separating measure P. check that if S is locally bounded (i.e. if there exists a sequence of stopping times τ k increasing to infinity such that the stopped processes S τ k are bounded) then ˜ The case of arbitrary, not necessarily S is a local martingale with respect to P. bounded S is of a special interest because the semimartingale model includes the classical discrete-time model as a particular case. The corresponding theorem, also due to Delbaen–Schachermayer [14], involves the notions of a σ -martingale and an equivalent σ -martingale measure. A semimartingale S is a σ -martingale (notation: S ∈ m ) if G · S ∈ Mloc for some G with values in ]0, 1]. The property Eσ MM means that there is Q ∼ P such that S ∈ m (Q). Theorem 4.3 Let X 1 be the set of stochastic integrals H · S ≥ −1. Then N F L V R ⇔ N F L B R ⇔ N F L ⇔ E S M ⇔ Eσ M M. The remaining non-trivial implication ESM ⇒ Eσ MM follows from Theorem 4.4 Let P˜ be a separating measure. Then for any ε > 0 there is Q ∼ P˜ with Var ( P˜ − Q) ≤ ε such that S is a σ -martingale under Q. A brief account of the Delbaen–Schachermayer theory including a short proof of the above theorem based on the inequality for the total variation distance from [40] is given in [33].

4.3 Hedging theorem and optional decomposition Let us consider the semimartingale model based on an n-dimensional price process S. Let C be a scalar random variable bounded from below and let := {x ∈ R : ∃ admissible H such that x + H · ST ≥ C}. In other words, is the set of initial endowments for which one can find an admissible strategy such that the terminal value of the corresponding portfolio dominates (super-replicates) the contingent claim C. “Admissible” means that the portfolio process is bounded from below by a constant. Obviously, if non-empty, is a semi-infinite interval. The following “hedging” theorem gives its characterization. Let Q be the set of probability measures Q ∼ P with respect to which S is a local martingale.

1. Arbitrage Theory

21

Theorem 4.5 Assume that Q = ∅. Then = [x∗ , ∞[ where x ∗ = sup E Q C. Q∈Q

This general formulation is due to Kramkov [47] who noticed that the assertion is a simple corollary of the following two results. Theorem 4.6 Assume that Q = ∅. Let X be a process bounded from below which is a supermartingale with respect to any Q ∈ Q. Then there is an admissible strategy H and an increasing process A such that X = X 0 + H · S − A. The process H · S, being bounded from below, is a local martingale with respect to every Q ∈ Q (the property that an integral with respect to a local martingale ´ is also a local martingale if it is one-side bounded is due to Emery for the scalar case and to Ansel and Stricker [1] for the vector case). Thus, this decomposition resembles that of Doob–Meyer but it holds simultaneously for the whole set Q; in general, it is non-unique and A may not be predictable but only adapted, hence, A, being right-continuous, is optional. This explains why the above result is usually referred to as the optional decomposition theorem. It was proved in [47] for the case where S is locally bounded; this assumption was removed in the paper [18]. The proof in [18] is probabilistic and provides an interpretation of the integrand H as the Lagrange multiplier. Alternative proofs with intensive use of functional analysis can be found in [13]. For an optional decomposition with constraints see [20], an extended discussion of the problem is given [19]. In [43] it is shown that if P ∈ Q then the subset of Q formed by the measures with bounded densities is dense in Q; this result implies, in particular, that, without any hypothesis, the subset of (local) martingale measures with bounded entropy is dense in Q. Proposition 4.7 Assume that C is such that sup Q∈Q E Q C < ∞. Then there exists a process X which is a supermartingale with respect to every Q ∈ Q such that X t = ess sup Q∈Q E Q (C|Ft ). This result is due to El Karoui and Quenez [16]; its proof also can be found in [47]. Proof of Theorem 4.5 The inclusion ⊆ [x∗ , ∞[ is obvious: if x + H · ST ≥ C then x ≥ E Q C for every Q ∈ Q. To show the opposite inclusion we may suppose that sup Q∈Q E Q H < ∞ (otherwise both sets are empty). Applying the optional decomposition theorem to the process X t = ess sup Q∈Q E Q (C|Ft )

22

Yu. M. Kabanov

we get that X = x∗ + H · S − A. Since x∗ + H · ST ≥ X T = C, the result follows.

4.4 Semimartingale model with transaction costs In this model it is assumed that the price process is a semimartingale S with nonnegative components. The dynamics of the value process V = V v,B is given by the linear stochastic equation V = v + V− • Y + B where Y i = (1/S−i ) · S i , B i :=

d j=1

L ji −

d (1 + λi j )L i j , j=1

and L i j is an increasing right-continuous process representing the accumulated net wealth “arriving” at a position i from the position j. At this level of generality, criteria of absence of arbitrage are still not available but the paper of Jouini and Kallal [30] is an important contribution to the subject. It provides an NFL criterion for the model of stock market with a bid–ask spread where, instead of transaction costs coefficients, two process are given, S and S, describing the evolution of the selling and buying prices. It is shown that a certain (specifically formulated) NFL property holds if and only if there exist a probability measure P˜ ∼ P and a process S whose components evolve between the corre˜ sponding components of S and S such that S is a martingale with respect to P. This result is consistent with the NA criteria for finite , see [41]. Apparently, the approach of Jouini and Kallal can be easily extended to the case of currency markets. However, one should take care that the setting of [30] is that of the L 2 -theory. The limitations of the latter in the context of financial modeling are well-known; in contrast with engineering where energy constraints are welcome, they do not admit an economical interpretation. We attract the reader’s attention to the recent paper [32] of the same authors where problems of equilibrium and viability (closely related with absence of arbitrage) are discussed; see also [31] for models with short-sell constrains. The situation with the hedging theorem is slightly better. Its first versions in [6] (for a two-asset model) and in [34] were established within the L 2 -framework. In the preprint [38] an attempt was made to work with the class of strategies for which the value process is bounded from below in the sense of partial ordering induced by the solvency cone. This class of strategies corresponds precisely to the usual definition of admissibility in the case of frictionless market. However, the result

1. Arbitrage Theory

23

was proved only for bounded price processes. To avoid difficulties one can look for other reasonable classes of admissible strategies. This approach was exploited in the paper [39] which contains the following hedging theorem. It is assumed that the matrix of transaction costs coefficients is constant, the first asset is the num´eraire, and there exists a probability measure P˜ such that S is ˜ a (true) martingale with respect to P. Let Bb be the class of strategies B such that the corresponding value processes are bounded from below by a price process multiplied by (negative) constants (this definition resembles that used by Sin in the frictionless case, [55]). In particular, it is admissible to keep short a finite number of units of assets. Let D be the set of martingales Z such that Z takes values in K ∗ . Notice that ˜ P|Ft ). Moreover, Z ∈ D {Z : Z = wρ, w ∈ K ∗ } ⊆ D where ρ t := E(d P/d 1 1 and we have Z = Z ; since the transaction costs are constant, it follows from the Z | ≤ κ Z 1 for a certain fixed constant κ. With these inequalities defining K ∗ that | remarks it is easy to conclude that Z V v,B is always a supermartingale whatever Z ∈ D and B ∈ Bb are. Define the convex set of hedging endowments = (Bb ) := {v ∈ Rd : ∃B ∈ Bb such that VTv,B ≥ K C} and the closed convex set Z 0v ≥ E Z T C ∀Z ∈ D}. D := {v ∈ Rd : Theorem 4.8 Assume that S is a continuous process and the solvency cone K is proper. Then = D. The “easy” inclusion ⊆ D holds in virtue of the supermartingale property of Z V v,B even without extra assumptions. The proof of the opposite inclusion given in [39] is based on a bipolar theorem in the space L 0 (Rd , FT ) equipped with a partial ordering. The hypotheses of the theorem and the structure of admissible strategies are used heavily in this proof. The assumption that K is proper, i.e. the interior (of K ∗ ) is non-empty, is essential (otherwise, may not be closed). However, the assertion ¯ = D can be established for arbitrary K . How to remove or relax the assumptions on continuity of S to make the result adequate to the hedging theorem without friction remains an open problem. Remark 4.9 It is important to note that the set of hedging endowments depends on the chosen class of admissible strategies. Let B0 be the class of buy-and-hold strategies with a single revision of the portfolio, namely, at time zero when the investor enters the market. It happens that in the most popular two-asset model under transaction costs with the price dynamics given by the geometric Brownian

24

Yu. M. Kabanov

motion where the problem is to hedge a European call option (or, more generally, a contingent claim C = g(ST )) we have (Bb ) = (B 0 ). This astonishing property was conjectured by Davis and Clark [9] and proved independently in [49] and [59], see also [7] and [2] for further generalizations. More precisely, in the mentioned papers it was shown that the investor having the initial endowment in money which is a minimal one to hedge the contingent claim C, can hedge it using buy-and-hold strategy from B0 . In other words, the conclusion was that the point with zero ordinate on the boundary of (Bb ) belongs also to the boundary of a smaller set (B 0 ). In fact, one can extend the arguments and prove that both sets coincide. 5 Large financial markets 5.1 Ross–Huberman APM The main conclusion of the Capital Asset Pricing Model (CAPM) by Lintner and Sharp is the following: the mean excess return on an asset is a linear function of its “beta”, a measure of risk associated with this asset. More precisely, we have the following result. Assume for simplicity that the riskless asset pays no interest. Suppose that the return on the i-th asset has mean µi and variance σ i2 , the market portfolio return has mean µ0 and variance σ 20 . Let γ i be the correlation coefficient between the returns on the i-th asset and the market portfolio. Then µi = µ0 β i where β i := γ i σ i /σ 0 . Unfortunately, the theoretical assumptions of CAPM are difficult to justify and its empirical content is dubious. One can expect that the empirical values of (β i , µi ) form a cloud around the so-called security market line but this phenomenon is observed only for certain data sets. The alternative approach, the Arbitrage Pricing Model (APM) suggested by Ross in [54] and placed on a solid mathematical basis by Huberman, results in a conclusion that there exists a relation between model parameters, which can be viewed as “approximately linear”, giving much better consistency with empirical data. Based on the idea of asymptotic arbitrage, it attracted considerable attention, see, e.g., [3], [4], [26], [27]; sometimes it is referred to as the Arbitrage Pricing Theory (APT). An important reference is the note by Huberman [25] who gave a rigorous definition of the asymptotic arbitrage together with a short and transparent proof of the fundamental result of Ross. The idea of Huberman is to consider a sequence of classical one-step finite-asset models instead of a single one with infinite number of securities (in the latter case an unpleasant phenomenon may arise similar to that of doubling strategies for models with infinite time horizon). When the number of assets increases to infinity, this sequence of models can be considered as a description of a large financial market.

1. Arbitrage Theory

25

A general specification of the n-th model M n is as follows. We are given a stochastic basis (n , F n , Fn , P n ) with a convex cone RT0n of square integrable (scalar) random variables. Assume for simplicity that the initial σ -algebra is trivial, FT = F. Here T stands for “terminal” and can be replaced by 1. As usual, the elements of RT0n are interpreted as the terminal values of portfolios. By definition, a sequence ξ n ∈ RT0n realizes an asymptotic arbitrage opportunity (AAO) if the following two conditions are fulfilled (E n and D n denote the mean and variance with respect to P n ): (a) limn E n ξ n = ∞; (b) limn D n ξ n = limn E n (ξ n − E n ξ n )2 = 0. Roughly speaking, if AAO exists, then, working with large portfolios, the investor can become infinitely rich (in the mean sense) with vanishing quadratic risk. We say that the large financial market has NAA property if there are no asymp totic arbitrage opportunities for any subsequence of market models {M n }. A simple but useful remark: the NAA property remains the same if we replace (a) in the definition of AAO by the weaker property lim supn E n ξ n > 0 (“if one can become rich, one can become infinitely rich”). Let ρ n be the L 2 -distance of R T0n from the unit, i.e. ρ n := inf E n (ξ − 1)2 , ξ ∈RT0n

Proposition 5.1 NAA ⇔ lim infn ρ n > 0. Proof (⇒) Assume that lim infn ρ n = 0. This means (modulo passage to a subsequence) that there are ξ n ∈ RT0n such that E n (ξ n − 1)2 → 0. It follows from the identity E n (ξ n − 1)2 = D n ξ n + (E n ξ n − 1)2 that D n ξ n → 0 and E n ξ n → 1, violating NAA. (⇐) Assume that NAA fails. This means (modulo passage to a subsequence) that there are ξ n ∈ RT0n , ξ n = 0, satisfying (a) and (b). It follows that E n (ξ n )2 = D n ξ n + (E n ξ n )2 → ∞.

n n Put ξ˜ := ξ n / E n (ξ n )2 . Then ξ˜ ∈ RT0n ,

D n ξ˜ = (1/E n (ξ n )2 )D n ξ n → 0 n

and (E n ξ˜ )2 = E n (ξ˜ )2 − D n ξ˜ = 1 − D n ξ˜ → 1. n

n

n

n

26

Yu. M. Kabanov

Thus, n n n E n (ξ˜ − 1)2 = D n ξ˜ + (E n ξ˜ − 1)2 → 0

and we get a contradiction. Suppose now that in the n-th model we are given a d-dimensional square integrable price process (Stn ) where t ∈ {0, T }. In general, d = d(n). Suppose that S0in = 1 (this is just a choice of scales). The crucial hypothesis of the k-factor APM is that there are k common sources of randomness affecting the prices of all securities and there are also individual sources of randomness related to each security. Specifically, we suppose that STin = µin +

k

in ζ nj bin j +η ,

i ≤ d,

j=1

or, in vector notation, STn = µn +

k

ζ nj bnj + ηn .

j=1

Here µn , bnj ∈ Rd , the scalar random variables ζ nj with zero means are square integrable and the d-dimensional random vector ηn with zero mean has uncorrelated components (representing randomness proper to each asset). Assume that Dηin ≤ C for all i ≤ d and n ∈ N for a certain constant C. A (self-financing) portfolio strategy H n is a vector in Rd such that n

H 1d :=

d

H in = 0.

i=1

At the final date the corresponding portfolio value is VTn = H n STn =

d

H i,n STin

i=1

and these random variables form the set RT0n . Lemma 5.2 Let Ln be the linear subspace in Rd spanned by the set {1d , bnj , j ≤ k} and let cn be the projection of µn onto L⊥ n . Then NAA

⇒

sup |cn | < ∞. n

Proof Let an be a real number. The vector H n := an cn (being orthogonal to 1d ) is a self-financing strategy with the corresponding terminal value VTn = an |cn |2 + an cn ηn .

1. Arbitrage Theory

27

It follows that E n VTn = an |cn |2 , D n VTn = an2 E(cn ηn )2 = an2

d

(cin )2 D n ηin ≤ Can2 |cn |2 .

i=1

In particular, for an = |cn |−3/2 we have an asymptotic arbitrage opportunity for any subsequence along which |cn | converges to infinity. As is easily seen from the proof, the conditions of the lemma are equivalent if D n ηin ≥ ε > 0 for all i and n. Proposition 5.3 Assume that NAA holds. Then there exist a constant A and realvalued sequences {r n }, {g nj }, j ≤ k, such that k d k 2 2 n n n n in n n in − r 1 − g b := − r − g b ≤ A. µ µ d j j j j j=1

i=1

j=1

The assertion is an obvious corollary of the above lemma: the vector cn is a difference of µn and the projection of µn onto Ln ; the latter is a linear combination of the generating vectors 1d , b1n , . . . , bkn . Of course, if the generators are not linearly independent, the coefficients r n , g1n , . . . , gkn are not uniquely defined. The most interesting case of the APM is the “stationary” one where all random variables “live” on the same probability space and do not depend on n. All model parameters also do not depend on n except the dimension d = n. In other words, we are given infinite-dimensional vectors µ = (µ1 , µ2 , . . .), η = (η1 , η2 , . . .), etc., and the ingredients of the n-th model, µn , ηn , etc., are composed of the first n coordinates of these vectors. One can think that the “real-world” market has an infinite number of securities, enumerated somehow, and the agent uses the first n of them in his portfolios. That is, the increment of the n-dimensional price process in the n-th model is STi

= µi +

k

ζ j bij + ηi ,

i ≤ n.

j=1

Theorem 5.4 Assume that NAA holds. Then there are constants r and g j , j ≤ k, such that ∞ k 2 µi − r − g j bij < ∞. i=1

j=1

28

Yu. M. Kabanov

Proof Let us consider the vector space spanned by the infinite-dimensional vectors 1∞ = (1, 1, . . .), b j = (b1j , b2j , . . .), j ≤ k. Without loss of generality we may assume that 1∞ , b j , j ≤ l, is a basis in this space. There is n 0 such that for every n ≥ n 0 the vectors formed by the first n components of the latter are linearly independent. For every n ≥ n 0 we define the set n k 2 g j bij ≤ A K n := (r, g1 , . . . , gl , 0, . . . , 0) ∈ Rk+1 : µi − r − i=1

j=1 n

where choosing A as in Proposition 5.3 ensures that K is non-empty. Clearly, K n is closed and K n+1 ⊆ K n . It is easily seen that K n is bounded (otherwise we could construct a linear relation between the vectors assumed to be linearly independent). Thus, the sets K n are compact, ∩n≥n0 K n = ∅, and the result follows. In the case where the num´eraire is a traded security, say, the first one (i.e. ST1n = 0) we can take r n = 0 for all n in Proposition 5.3 and r = 0 in Theorem 5.4. To see this, we repeat the arguments above with “truncated” price vectors and strategies, the first component being excluded. In this specification an admissible strategy is just a vector from Rd−1 and the projection onto the vector with unit coordinates is not needed. To make the relation between CAPM and APM clear, let us consider the onefactor stationary model where the num´eraire is a traded security and the increments of the risky asset (enumerating from zero) are of the following structure: ST0

= µ 0 + b0 ζ ,

STi

= µi + bi ζ + ηi ,

i ≥ 1.

where all random variables ζ and ηi are uncorrelated and have zero means. Assume that Dηi ≤ C. The 0-th asset plays a particular role: all other price movements are conditionally uncorrelated given ST0 . It can be viewed as a kind of “market portfolio” or “market index”. If there is no asymptotic arbitrage, then there exists a constant g such that ∞ (µi − gbi )2 < ∞ i=0

i.e. µi = gbi + u i where u i → 0. If the residual u 0 is small, then µ0 ≈ gb0 . We can use the latter relation to specify g and conclude that µi ≈ µ0 β i (at least, for sufficiently large i) with β i := bi /b0 . Of course, this reasoning is far from being rigorous: the empirical data, even being in accordance with APM, may or may not follow the conclusion of CAPM. Note that the approach of APT is based on the assumption that the agents have certain risk-preferences and in the asymptotic setting they may accept the

1. Arbitrage Theory

29

possibility of large losses with small probabilities; the variance is taken as an appropriate measure of risk. A specific feature of the classical APT is that it does not deal with the problem of existence of equivalent martingale measures which is the key point of the Fundamental Theorem of Asset Pricing. For a long time these two arbitrage theories were considered as unrelated. In [35] an approach was suggested which puts together basic ideas of both of them and allows us to solve the long-standing problem of extension of APT to the continuous-time setting. A brief account of its further development is given in the next subsections.

5.2 Asymptotic arbitrage and contiguity The theory of large financial markets contains four principal ingredients: basic concepts, functional-analytic methods, probabilistic results, and analysis of specific models. The fundamentals of this theory were established in [35] where the definitions of asymptotic arbitrage of the first and the second kind were suggested. Assuming the uniqueness of equivalent martingale measures (i.e. the completeness) for each market model, the authors proved necessary and sufficient conditions for NAA1 and NAA2 in terms of contiguity of sequences of equivalent martingale measures and objective (“historical”) probabilities. A particular model of a “large Black–Scholes market” (where the price processes are correlated geometric Brownian motions) was investigated. It was shown that the boundedness condition similar to that of Ross–Huberman can be obtained as a direct application of the Liptser–Shiryaev criteria of contiguity in terms of the Hellinger processes. The restricting uniqueness hypothesis was removed by Klein and Schachermayer (see [45], [46], and [44]). They discovered the importance of duality methods of geometric functional analysis in the context of large financial markets and found non-trivial extensions of NAA1 and NAA2 criteria for the case of incomplete market models. These criteria were complemented in [37] by new ones. In particular, it was shown that the strong asymptotic arbitrage is equivalent to the complete asymptotic separability of the historic probabilities and equivalent martingale measures. Our presentation follows the latter paper where also several modifications of classical models were analyzed and necessary and sufficient conditions for absence of asymptotic arbitrage were obtained in terms of model specifications. In the terminology of [37], a large financial market is a sequence of ordinary semimartingale models of a frictionless market {(Bn , S n , T n )}, where Bn is a stochastic basis with the trivial initial σ -algebra. A semimartingale price process S n takes values in Rd for some d = d(n). To simplify notation we shall often omit the superscript for the time horizon.

30

Yu. M. Kabanov

We denote by Qn the set of all probability measures Q n equivalent to P n such that S n is a local martingale with respect to Q n . It is assumed that each set Qn of equivalent local martingale measures is non-empty. We define a trading strategy on (Bn , S n , T n ) as a predictable process H n with values in Rd such that the stochastic integral with respect to the semimartingale S n H n · S n is well-defined on [0, T ]. For a trading strategy H n and an initial endowment x n the value process V n = V (n, x n , H n ) := x n + H n · S n . A sequence V n realizes asymptotic arbitrage of the first kind (AA1) if (1a) Vtn ≥ 0 for all t ≤ T ; (1b) limn V0n = 0 (i.e. limn x n = 0); (1c) limn P n (VTn ≥ 1) > 0. A sequence V n realizes asymptotic arbitrage of the second kind (AA2) if (2a) Vtn ≤ 1 for all t ≤ T ; (2b) limn V0n > 0; (2c) limn P n (VTn ≥ ε) = 0 for any ε > 0. A sequence V n realizes strong asymptotic arbitrage of the first kind (SAA1) if (3a) Vtn ≥ 0 for all t ≤ T ; (3b) limn V0n = 0 (i.e. limn x n = 0); (3c) limn P n (VTn ≥ 1) = 1. One can continue and give also the definition SAA2. It is easy to understand that the existence of SAA1 implies the existence of SAA2 and vice versa (provided that there are no specific constraints). So existence criteria are the same in both cases. A large security market {(Bn , S n , T n )} has no asymptotic arbitrage of the first kind (respectively, of the second kind) if for any subsequence (m) there are no value processes V m realizing asymptotic arbitrage of the first kind (respectively, of the second kind) for {(Bm , S m , T m )}. To formulate the results we need to extend some notions from measure theory. Let Q = {Q} be a family of probabilities on a measurable space (, F). Define the upper and lower envelopes of measures from Q as the set functions with Q(A) := sup Q(A), Q∈Q

Q(A) := inf Q(A), Q∈Q

A ∈ F.

We say that Q is dominated if any element of Q is absolutely continuous with respect to some fixed probability measure. In our setting, where for every n a family Qn of equivalent local martingale n measures is given, we use the obvious notations Q and Qn .

1. Arbitrage Theory

31

Generalizing in a straightforward way the well-known notion of contiguity to set functions other than measures, we introduce the following definitions: n n The sequence (P n ) is contiguous with respect to (Q ) (notation: (P n ) $ (Q )) when the implication n

lim Q (An ) = 0

⇒

n→∞

lim P n (An ) = 0

n→∞

holds for any sequence An ∈ F n , n ≥ 1. n Obviously, (P n ) $ (Q ) if and only if the implication lim sup E Q g n = 0

n→∞ Q∈Qn

⇒

lim E P n g n = 0

n→∞

holds for any uniformly bounded sequence g n of positive F n -measurable random variables. n n A sequence (P n ) is asymptotically separable from (Q ) (notation: (P n ) % (Q )) if there exists a subsequence (m) with sets Am ∈ F m such that m

lim Q (Am ) = 0,

m→∞

lim P m (Am ) = 1.

m→∞

Proposition 5.5 The following conditions are equivalent: (a) there is no asymptotic arbitrage of the first kind (NAA1); n (b) (P n ) $ (Q ); (c) there exists a sequence R n ∈ Qn such that (P n ) $ (R n ). Proof (b) ⇒ (a) Let (V n ) be a sequence of value processes realizing asymptotic arbitrage of the first kind. For any Q ∈ Qn the process V n is a non-negative local Q-martingale, hence a Q-supermartingale, and sup E Q VTn ≤ sup E Q V0n = x n → 0

Q∈Qn

Q∈Qn

by (1b). Thus, n

Q (VTn ≥ 1) := sup Q(VTn ≥ 1) → 0 Q∈Qn

n

and, by contiguity (P n ) $ (Q ), we have P n (VTn ≥ 1) → 0 in contradiction to (1c). n (a) ⇒ (b) Assume that (P n ) is not contiguous with respect to (Q ). Taking, n if necessary, a subsequence we can find sets n ∈ F n such that Q ( n ) → 0, P n ( n ) → γ as n → ∞ where γ > 0. According to Proposition 4.7 the process X tn = ess sup Q∈Qn E Q (I n |Ftn ) is a supermartingale with respect to any Q ∈ Qn . By Theorem 4.6 it admits a decomposition X n = X 0n + H n · S n − An where An is an increasing process. Let

32

Yu. M. Kabanov

us show that V n := X 0n + H n · S n are value processes realizing AA1. Indeed, V n = X n + An ≥ 0, n

V0n = sup E Q I n = Q ( n ) → 0, Q∈Qn

and lim P n (VTn ≥ 1) ≥ lim P n (X Tn ≥ 1) = lim P n (X Tn = 1) = lim P n ( n ) = γ > 0. n

n

n

n

(b) ⇔ (c) This relation follows from the convexity of Qn and a general result given below. Proposition 5.6 Assume that for any n ≥ 1 we are given a probability space (n , F n , P n ) with a dominated family Qn of probability measures. Then the following conditions are equivalent: n

(a) (P n ) $ (Q ); (b) there is a sequence R n ∈ conv Qn such that (P n ) $ (R n ); (c) the following equality holds: lim lim inf

sup

α↓0 n→∞ Q∈conv Qn

H (α, Q, P n ) = 1,

where H (α, Q, P) = (d Q)α (d P)1−α is the Hellinger integral of order α ∈ ]0, 1[. The sequence of sets of probability measures (Qn ) is said to be weakly contiguous with respect to (P n ) (notation: (Qn ) $w (P n )) if for any ε > 0 there are δ > 0 and a sequence of measures Q n ∈ Qn such that for any sequence An ∈ F n with the property lim supn P n (An ) < δ we have lim supn Q n (An ) < ε. For the case where the sets Qn are singletons containing only the measure Q n , the relation (Qn ) $w (P n ) means simply that (Q n ) $ (P n ). Obviously, the property (Qn ) $w (P n ) can be formulated in terms of random variables: for any ε > 0 there are δ > 0 and a sequence of measures Q n ∈ Qn such that for any sequence of F n -measurable random variables g n taking values in the interval [0, 1] with the property lim supn E P n g n < δ, we have lim supn E Q n g n < ε. Proposition 5.7 The following conditions are equivalent: (a) there is no asymptotic arbitrage of the second kind (NAA2); (b) (Qn ) $ (P n ); (c) (Qn ) $w (P n ).

1. Arbitrage Theory

33

The proof of Proposition 5.7 is similar to that of Proposition 5.5. Notice that the conditions (b) in both statements look rather symmetric in contrast to the conditions (c). In general, the condition (b) of Proposition 5.7 may hold though a sequence Q n ∈ Qn such that (Q n ) $ (P n ) does not exist (see an example in [45]). The reason is that the set functions Q and Q are of a radically different nature. The following assertion gives criteria of existence of strong asymptotic arbitrage. Proposition 5.8 The following conditions are equivalent: (a) (b) (c) (d)

there is SAA1; n (P n ) % (Q ); (Qn ) % (P n ); (P n ) % (Q n ) for any sequence Q n ∈ Qn .

Let P and P˜ be two equivalent probability measures on a stochastic basis B and ˜ let R := (P + P)/2. Let us denote by z and z˜ the density processes of P and ˜ P with respect to R. For arbitrary α ∈ ]0, 1[ the process Y = Y (α) := z α z˜ 1−α is a R-supermartingale admitting the multiplicative decomposition Y = ME(−h) where M = M(α) is a local Q-martingale, E is the Dol´ean–Dade exponential, and ˜ is an increasing predictable process, h 0 = 0, called the Hellinger h = h(α, P, P) process of order α. These Hellinger processes play an important role in criteria of absolute continuity and, more generally, contiguity of probability measures, see [28] for details. In the abstract setting of Proposition 5.6 when the probability spaces are equipped with filtrations (i.e. they are stochastic bases) we have the following results which are helpful in analysis of particular models arising in mathematical finance. Theorem 5.9 The following conditions are equivalent: n

(a) (P n ) $ (Q ); (b) for all ε > 0 lim lim sup α↓0

n→∞

inf

Q∈conv Qn

P n (h ∞ (α, Q, P n ) ≥ ε) = 0.

Theorem 5.10 Assume that the family Qn is convex and dominated for any n. Then the following conditions are equivalent: (a) (Qn ) $ (P n ); (b) for all ε > 0 lim lim sup inf n Q(h ∞ (α, P n , Q) ≥ ε) = 0. α↓0

n→∞

Q∈Q

34

Yu. M. Kabanov

The concept of contiguity is useful in relation with an important question whether the option prices calculated in “approximating” models converge to the “true” option price, see [24] and [58]. 5.3 A large BS-market Let (, F, F = (Ft ), P) be a stochastic basis with a countable set of independent one-dimensional Wiener processes w i , i ∈ Z+ , wn = (w0 , . . . , w n ), and let Fn = (Ftn ) be a filtration generated by wn . For simplicity, assume that T is fixed. The behavior of the stock prices is described by the following stochastic differential equations: d X t0 = µ0 X t0 dt + σ 0 X t0 dwt0 , d X ti = µi X ti dt + σ i X ti (γ i dwt0 + γ¯ i dwti ),

i ∈ N,

with (deterministic strictly positive) initial points X 0i . Here γ i is a function taking values in [0, 1[ and γ i2 + γ¯ i2 = 1, We assume that µi , σ i ∈ L 2 [0, T ] and σ i > 0. Notice that the process ξ i with dξ it = γ i dwt0 + γ¯ i dwti ,

ξ i0 = 0,

is a Wiener process. Thus, in the case of constant coefficients price processes are geometric Brownian motions as in the classical case of Black and Scholes. The model is designed to reflect the fact that in the market there are two different types of randomness: the first type is proper to each stock while the second one originates from some common source and it is accumulated in a “stock index” (or “market portfolio”) whose evolution is described by the first equation. Set γ σi γ σiσ0 β i := i = i 2 . σ0 σ0 In the case of deterministic coefficients, β i is a well-known measure of risk which is the covariance between the return on the asset with number i and the return on the index, divided by the variance of the return on the index. Let bn (t) := (b0 (t), b1 (t), . . . , bn (t)) where b0 := − Assume that for every n

µ0 , σ0

bi :=

β i µ0 − µi . σ i γ¯ i

T

|bn (t)|2 dt < ∞. 0

We consider the stochastic basis Bn = (, F, Fn = (Ftn )t≤T , P n ) with the (n + 1)-dimensional semimartingale S n := (X t0 , X t1 , . . . , X tn ) and P n := P|FTn . The

1. Arbitrage Theory

35

sequence {(Bn , S n , T )} is a large security market. In our case each (Bn , S n , T ) is a model of a complete market and the set Qn is a singleton which consists of the measure Q n = Z T (bn )P n where T

1 T n 2 (bn (t), dwt ) − |bn (t)| dt . Z T (bn ) := exp 2 0 0 The Hellinger process has an explicit expression 2 T 2 n µ α(1 − α) µ − β µ 0 i 0 i ds. h(α, Q n , P n ) = + 2 σ0 σ i γ¯ i 0 i=1 As a corollary of Theorem 5.9 we have Proposition 5.11 The condition NAA1 holds if and only if T 2 ∞ µ0 µi − β i µ0 2 ds < ∞. + σ0 σ i γ¯ i 0 i=1 In fact, in this model both conditions NAA1 and NAA2 hold simultaneously. In the particular case of constant coefficients, finite T , and 0 < c ≤ σ i γ¯ i ≤ C we get that the property NAA1 holds if and only if ∞

(µi − β i µ0 )2 < ∞,

i=1

i.e. the Huberman–Ross boundedness is fulfilled. 5.4 One-factor APM revisited We consider the “stationary” one-factor model of the following specific structure (cf. with the model given at the end of Subsection 5.1). Let (! i )i≥0 be independent random variables given on a probability space (, F, P) and taking values in a finite interval [−N , N ], E! i = 0, E! i2 = 1. At time zero all asset prices S0i = 1 and ST0

= 1 + µ0 + σ 0 ! 0 ,

STi

= 1 + µi + σ i (γ i ! 0 + γ¯ i ! i ),

i ≥ 1.

The coefficients here are deterministic, σ i > 0, γ¯ i > 0 and γ i2 + γ¯ i2 = 1. The asset with number zero is interpreted as a market portfolio, γ i is the correlation coefficient between the rate of return for the market portfolio and the rate of return for the asset with number i. For n ≥ 0 we consider the stochastic basis Bn = (, F n , Fn = (Ftn )t∈{0,1} , P n ) with the (n + 1)-dimensional random process Sn := (St0 , St1 , . . . , Stn )t∈{0,1} where

36

Yu. M. Kabanov

F0n is the trivial σ -algebra, F1n = F n := σ {! 0 , . . . , ! n }, and P n = P|F n . According to our definition, the sequence M = {(Bn , Sn , 1)} is a large security market. Let β i := γ i σ i /σ 0 , b0 := −

µ0 , σ0

bi :=

µ0 β i − µi , σ i γ¯ i

i ≥ 1.

It is convenient to rewrite the price increments as follows: ST0

= 1 + σ 0 (! 0 − b0 ),

STi

= 1 + σ i γ i (! 0 − b0 ) + σ i γ¯ i (! i − bi )),

i ≥ 1.

The set Qn of equivalent martingale measures for Sn has a very simple description: Q ∈ Qn iff Q ∼ P n and E Q (! i − bi ) = 0,

0 ≤ i ≤ n,

i.e. the bi are mean values of ! i under Q. Obviously, Qn = ∅ iff P(! i > bi ) > 0 and P(! i < bi ) > 0 for all i ≤ n. As usual, we assume that Qn = ∅ for all n; this implies, in particular, that |bi | < N . Let Fi be the distribution function of ! i . Put s i := inf{t : Fi (t) > 0},

s¯i := inf{t : Fi (t) = 1},

d i := bi − s i , d¯i := s¯i − bi , and di := d i ∧ d¯i . In other words, di is the distance from bi to the end points of the interval [s i , s¯i ]. Proposition 5.12 The following assertions hold: n

(a) infi di = 0 ⇔ SAA ⇔ (P n ) % (Q ), n (b) infi di > 0 ⇔ NAA1 ⇔ (P n ) $ (Q ), (c) lim supi |bi | = 0 ⇔ NAA2 ⇔ (Qn ) $ (P n ). The hypothesis that the distributions of ! i have finite support is important: it excludes the case where the value of every non-trivial portfolio is negative with positive probability. For the proof of this result, we send the reader to the original paper [37].

Appendix: Facts from convex analysis 1 By definition, a subset K in Rn (or in a linear space X ) is a cone if it is convex and stable under multiplication by the non-negative constants. It defines the partial ordering: x ≥K y

⇔

x − y ∈ K;

1. Arbitrage Theory

37

in particular, x ≥ K 0 means that x ∈ K . A closed cone K is proper if the linear space F := K ∩ (−K ) = {0}, i.e. if the relations x ≥ K and x ≤ K = 0 imply that x = 0. Let K be a closed cone and let π : Rn → Rn /F be the canonical mapping onto the quotient space. Then π K is a proper closed cone. For a set C we denote by cone C the set of all conic combinations of elements of C. If C is convex then cone C = ∪λ≥0 λC. Let K be a cone. Its dual positive cone K ∗ := {z ∈ Rn : zx ≥ 0 ∀x ∈ K } is closed (the dual cone K ◦ is defined using the opposite inequality, i.e. K ◦ = −K ∗ ); K is closed if and only if K = K ∗∗ . We use the notations int K for the interior of K and ri K for the relative interior (i.e. the interior in K − K , the linear subspace generated by K ). A closed cone K in the Euclidean space Rn is proper if and only if there exists a compact convex set C such that 0 ∈ / C and K = cone C. One can take as C the convex hull of the intersection of K with the unit sphere {x ∈ Rn : |x| = 1}. A closed cone K is proper if and only if int K ∗ = ∅. We have ri K ∗ = {w : wx > 0 ∀x ∈ K , x = F}; in particular, if K is proper then int K ∗ = {w : wx > 0 ∀x ∈ K , x = 0}. By definition, the cone K is polyhedral if it is the intersection of a finite number of half-spaces {x : pi x ≥ 0}, pi ∈ Rn , i = 1, . . . , N . The Farkas–Minkowski–Weyl theorem: a cone is polyhedral if and only if it is finitely generated. The following result is a direct generalization of the Stiemke lemma. Lemma A.1 Let K and R be closed cones in Rn . Assume that K is proper. Then R ∩ K = {0}

⇔

(−R ∗ ) ∩ int K ∗ = ∅.

Proof (⇐) The existence of w such that wx ≤ 0 for all x ∈ R and wy > 0 for all y in K \ {0} obviously implies that R and K \ {0} are disjoint. (⇒) Let C be a convex compact set such that 0 ∈ / C and K = cone C. By the separation theorem (for the case where one set is closed and another is compact)

38

Yu. M. Kabanov

there is a non-zero z ∈ Rn such that sup zx < inf zy. y∈C

x∈R

Since R is a cone, the left-hand side of this inequality is zero, hence z ∈ −R ∗ and, also, zy > 0 for all y ∈ C. The latter property implies that zy > 0 for z ∈ K , z = 0, and we have z ∈ int K . In the classical Stiemke lemma K = Rn+ and R = {y ∈ Rn : y = Bx, x ∈ Rd } where B is a linear mapping. Usually, it is formulated as the alternative: either there is x ∈ Rd such that Bx ≥ K 0 and Bx = 0 or there is y ∈ Rn with strictly positive components such that B ∗ y = 0. Lemma A.1 can be slightly generalized. Let J1 be the natural projection of Rn onto Rn /F. Theorem A.2 Let K and R be closed cones in Rn . Assume that the cone π R is closed. Then R∩K ⊆F

⇔

(−R ∗ ) ∩ ri K ∗ = ∅.

Proof It is easy to see that π(R ∩ K ) = π R ∩ π K and, hence, R∩K ⊆F

⇔

π R ∩ π K = {0}.

By Lemma A.1 π R ∩ π K = {0}

⇔

(−π R)∗ ∩ int (π K )∗ = ∅.

Since (π R)∗ = π ∗−1 R ∗ and int (π K )∗ = π ∗−1 (ri K ∗ ), the condition in the righthand side can be written as π ∗−1 ((−R ∗ ) ∩ ri K ∗ ) = ∅ or, equivalently, (−R ∗ ) ∩ ri K ∗ ∩ Im π ∗ = ∅. But Im π ∗ = (K ∩ (−K ))∗ = K ∗ − K ∗ ⊇ ri K ∗ and we get the result. Notice that if R is polyhedral then π R is also polyhedral, hence closed. 2 The following result is referred to as the Kreps–Yan theorem, see [48], [63], [5]. It holds for arbitrary p ∈ [1, ∞], p−1 + q −1 = 1, but the cases p = 1 and p = ∞ are the most important.

1. Arbitrage Theory

39 p

Theorem A.3 Let C be a convex cone in L p closed in σ {L p , L q }, containing −L + p ˜ P ∈ L q such that and such that C ∩ L + = {0}. Then there is a P˜ ∼ P with d P/d ˜ ≤ 0 for all ξ ∈ C. Eξ p

Proof By the Hahn–Banach theorem any non-zero x ∈ L + := L p (R+ , F) can be separated from C: there is a z x ∈ L q such that E z x x > 0 and E z x ξ ≤ 0 p for all ξ ∈ C. Since C ⊇ −L + , the latter property yields that z x ≥ 0; we may assume ||z x ||q = 1. By the Halmos–Savage lemma the dominated family {Px = p z x P : x ∈ L + , x = 0} contains a countable equivalent family {Pxi }. But then −i z := 2 z xi > 0 and we can take P˜ := z P. Recall that the Halmos–Savage lemma, though important, is, in fact, very simple. It suffices to prove its claim for the case of a convex family (in our situation we even have this property). A family {Pxi } such that the sequence I{z xi >0} increases to ess sup I{z x >0} (existing because of convexity) meets the requirement. The above theorem has the following “purely geometric” version, [5]. Theorem A.4 Suppose J and K are non-empty convex cones in a separable Banach space X such that J ∩ K − J = {0}. Then there is a continuous linear functional z such that zx > 0 ∀ x ∈ J and zx ≤ 0 ∀ x ∈ K . The first step of the proof is the same as of the previous theorem: the separation of single points allows us to construct the set of {z x ∈ X , x ∈ K } with unit norms. The second step is to select a countable weak∗ dense subset. This can be done because the separability of X implies that the weak∗ -topology on the unit ball of X (always weak∗ compact) is metrizable. For the Lebesgue spaces the separability means that the σ -algebra is countably generated. Specific properties of these spaces allow us, by means of the Halmos–Savage lemma, to avoid such an unpleasant assumption on the σ -algebra. References [1] Ansel, J.-P. and Stricker, C. (1994), Couverture des actifs contingents. Ann. Inst. Henri Poincar´e 30, 2, 303–15. [2] Bouchard-Denize, B. and Touzi, N. (2001), Explicit solution of the multivariate super-replication problem under transaction costs. Preprint. [3] Chamberlain, G. (1983), Funds, factors, and diversification in arbitrage pricing models. Econometrica 51, 5, 1305–23. [4] Chamberlain, G. and Rothschild, M. (1983), Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51, 5, 1281–304. [5] Clark, S.A. (1992), The valuation problem in arbitrage price theory. J. Math. Economics 22, 463–78. [6] Cvitani´c, J. and Karatzas, I. (1996), Hedging and portfolio optimization under transaction costs: a martingale approach. Mathematical Finance 6, 2, 133–65.

40

Yu. M. Kabanov

[7] Cvitani´c, J., Pham, H. and Touzi, N. (1999), A closed form solution to the problem of super-replication under transaction costs. Finance and Stochastics 3, 1, 35–54. [8] Dalang, R.C., Morton, A. and Willinger, W. (1990), Equivalent martingale measures and no-arbitrage in stochastic securities market model. Stochastics and Stochastic Reports 29, 185–201. [9] Davis, M.H.A. and Clark, J.M.C. (1994), A note on super-replicating strategies. Philos. Trans. Roy. Soc. London A 347, 485–94. [10] Delbaen, F. (1992), Representing martingale measures when asset prices are continuous and bounded. Mathematical Finance 2, 107–30. [11] Delbaen, F., Kabanov, Yu.M and Valkeila, S. (2001), Hedging under transaction costs in currency markets: a discrete-time model. Mathematical Finance. To appear. [12] Delbaen, F. and Schachermayer, W. (1994), A general version of the fundamental theorem of asset pricing. Math. Annalen 300, 463–520. [13] Delbaen, F. and Schachermayer, W. (1999), A compactness principle for bounded sequence of martingales with applications. Proceedings of the Seminar of Stochastic Analysis, Random Fields and Applications, 1999. [14] Delbaen, F. and Schachermayer, W. (1998), The fundamental theorem of asset pricing for unbounded stochastic processes. Math. Annalen 312, 215–50. [15] Dellacherie, C. and Meyer, P.-A. Probabilit´es et Potenciel. Hermann, Paris, 1980. [16] El Karoui, N. and Quenez, M.-C. (1995), Dynamic programming and pricing of contingent claims in an incomplete market. SIAM Journal on Control and Optimization 33, 1, 27–66. ´ [17] Emery, M. (1979), Une topologie sur l’espace de semimartingales. S´eminaire de Probabilit´es XIII. Lect. Notes Math., 721, 260–80. [18] F¨ollmer, H. and Kabanov, Yu.M. (1998), Optional decomposition and Lagrange multipliers. Finance and Stochastics 2, 1, 69–81. [19] F¨ollmer, H. and Kabanov, Yu.M. (1996), Optional decomposition theorems in discrete time. Atti del convegno in onore di Oliviero Lessi, Padova, 25–26 marzo 1996, 47–68. [20] F¨ollmer, H. and Kramkov, D.O. (1997), Optional decomposition theorem under constraints. Probability Theory and Related Fields 109, 1, 1–25. ¨ [21] Gordan, P. (1873), Uber di Aufl¨osung linearer Gleichungen mit reelen Koefficienten. Math. Annalen 6, 23–8. [22] Hall, P. and Heyde, C.C. Martingale Limit Theory and Its Applications. Academic Press, New York, 1980. [23] Harrison, M. and Pliska, S. (1981), Martingales and stochastic integrals in the theory of continuous trading. Stochastic Processes and their Applications 11, 215–60. [24] Hubalek, F. and Schachermayer, W. (1998), When does convergence of asset price processes imply convergence of option prices? Mathematical Finance 8, 4, 215–33. [25] Huberman, G. (1982), A simple approach to arbitrage pricing theory. Journal of Economic Theory 28, 1, 183–91. [26] Ingersoll, J.E., Jr. (1984), Some results in the theory of arbitrage pricing. Journal of Finance 39, 1021–39. [27] Ingersoll, J.E., Jr. Theory of Financial Decision Making. Rowman and Littlefield, 1989. [28] Jacod, J. and Shiryaev, A.N. Limit Theorems for Stochastic Processes. Springer, Berlin–Heidelberg–New York, 1987. [29] Jacod, J. and Shiryaev, A.N. (1998), Local martingales and the fundamental asset pricing theorem in the discrete-time case. Finance and Stochastics 2, 3, 259–73. [30] Jouini, E. and Kallal, H. (1995), Martingales and arbitrage in securities markets with

1. Arbitrage Theory

41

transaction costs. J. Economic Theory 66, 178–97. [31] Jouini, E. and Kallal, H. (1995), Arbitrage in securities markets with short sale constraints. Mathematical Finance 5, 3, 197–232. [32] Jouini, E. and Kallal, H. (1999), Viability and equilibrium in securities markets with frictions. Mathematical Finance 9, 3, 275–92. [33] Kabanov, Yu.M. On the FTAP of Kreps–Delbaen–Schachermayer. Statistics and Control of Random Processes. The Liptser Festschrift. Proceedings of Steklov Mathematical Institute Seminar, World Scientific, 1997, 191–203. [34] Kabanov, Yu.M. (1999), Hedging and liquidation under transaction costs in currency markets. Finance and Stochastics 3, 2, 237–48. [35] Kabanov, Yu. M. and Kramkov, D.O. (1994), Large financial markets: asymptotic arbitrage and contiguity. Probability Theory and its Applications 39, 1, 222–9. [36] Kabanov, Yu.M. and Kramkov, D.O. (1994), No-arbitrage and equivalent martingale measure: an elementary proof of the Harrison–Pliska theorem. Probability Theory and Its Applications, 39, 3, 523–7. [37] Kabanov, Yu.M. and Kramkov, D.O. (1998), Asymptotic arbitrage in large financial markets. Finance and Stochastics 2, 2, 143–72. [38] Kabanov, Yu.M. and Last, G. (2001a), Hedging in a model with transaction costs. Preprint. [39] Kabanov, Yu.M. and Last, G. (2001b), Hedging under transaction costs in currency markets: a continuous-time model. Mathematical Finance. To appear. [40] Kabanov, Yu.M., Liptser, R.Sh. and Shiryayev, A.N. (1981), On the variation distance for probability measures defined on a filtered space. Probability Theory and Related Fields 71, 19–36. [41] Kabanov, Yu.M. and Stricker, Ch. (2001a), The Harrison–Pliska arbitrage pricing theorem under transaction costs. J. Math. Econ. To appear. [42] Kabanov, Yu.M. and Stricker Ch. (2001b), A teachers’ note on no-arbitrage criteria. S´eminaire de Probabilit´es. To appear. [43] Kabanov, Yu.M., Stricker, Ch. (2001c), On equivalent martingale measures with bounded densities. S´eminaire de Probabilit´e. To appear. [44] Klein, I. (2001), A fundamental theorem of asset pricing for large financial markets. Preprint. [45] Klein, I. and Schachermayer, W. (1996), Asymptotic arbitrage in non-complete large financial markets. Probability Theory and its Applications 41, 4, 927–34. [46] Klein, I. and Schachermayer, W. (1996), A quantitative and a dual version of the Halmos–Savage theorem with applications to mathematical finance. Annals of Probability 24, 2, 867–81. [47] Kramkov, D.O. (1996), Optional decomposition of supermartingales and hedging in incomplete security markets. Probability Theory and Related Fields 105, 4, 459–79. [48] Kreps, D.M. (1981), Arbitrage and equilibrium in economies with infinitely many commodities. J. Math. Economics 8, 15–35. [49] Levental, S. and Skorohod, A.V. (1997), On the possibility of hedging options in the presence of transaction costs. The Annals of Applied Probability 7, 410–43. [50] M´emin, J. (1980), Espace de semimartingales et changement de probabilit´e. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und Verw. Geb., 52, 9–39. [51] Pshenychnyi, B.N. Convex Analysis and Extremal Problems. Nauka, Moscow, 1980 (in Russian). [52] Rockafellar, R.T. Convex Analysis. Princeton University Press, Princeton, 1970. [53] Rogers, L.C.G. (1994), Equivalent martingale measures and no-arbitrage. Stochastic and Stochastics Reports 51, 41–51.

42

Yu. M. Kabanov

[54] Ross, S.A. (1976), The arbitrage theory of asset pricing. Journal of Economic Theory 13, 1, 341–60. [55] Sin, C.A. Strictly local martingales and hedge ratios on stochastic volatility models. PhD-dissertation, Cornell University, 1996. [56] Schachermayer, W. (1992), A Hilbert space proof of the fundamental theorem of asset pricing in finite discrete time. Insurance: Mathematics and Economics 11, 249–57. [57] Shiryaev, A.N. Probability. Springer, Berlin–Heidelberg–New York, 1984. [58] Shiryaev, A.N. Essentials of Stochastic Finance. World Scientific, Singapore, 1999. [59] Soner, H.M., Shreve, S.E. and Cvitani´c, J. (1995), There is no non-trivial hedging portfolio for option pricing with transaction costs. The Annals of Applied Probability 5, 327–55. [60] Stricker, Ch. (1990), Arbitrage et lois de martingale. Annales de l’Institut Henri Poincar´e. Probabilit´e et Statistiques 26, 3, 451–60. [61] Schrijver, A. Theory of Linear and Integer Programming. Wiley, 1986. ¨ [62] Stiemke, E. (1915), Uber positive L¨osungen homogener linearer Gleichungen. Math. Annalen 76, 340–2. [63] Yan, J.A. (1980), Caract´erisation d’une classe d’ensembles convexes de L 1 et H 1 . S´eminaire de Probabilit´es XIV. Lect. Notes Math., 784, 260–80.

2 Market Models with Frictions: Arbitrage and Pricing Issues Ely`es Jouini and Clotilde Napp

1 Introduction The Fundamental Theorem of Asset Pricing, which originates in the Arrow– Debreu model (Debreu (1959)) and is further formalized in (among others) Harrison and Kreps (1979), Kreps (1981), Harrison and Pliska (1981), Duffie and Huang (1986), Dybvig and Ross (1987), Dalang, Morton and Willinger (1989), Back and Pliska (1990), Stricker (1990), Delbaen (1992), Lakner (1993) and Delbaen and Schachermayer (1994, 1998), asserts that the absence of free lunch in a frictionless (and complete) securities market model is equivalent to the existence of an equivalent martingale measure for the normalized securities price processes. The only arbitrage free and viable pricing rule on the set of contingent claims, which is a linear space, is then equal to the expected value with respect to the unique equivalent martingale measure. In this chapter, we study some foundational issues in the theory of asset pricing in general models with flows as well as in securities market models with frictions. We consider financial models, where any investment opportunity is described by the cash flow that it generates. For instance, in such models, the investment opportunity, which consists, in a perfect financial model, of buying at time t1 one unit of a risky asset, whose price process is given by (St )t≥0 , and selling at time t2 with t1 ≤ t2 the unit bought, is described by the process ("t )t≥0 which is null outside {t1 , t2 } and which satisfies "t1 = −St1 and "t2 = St2 . Sections 2 and 3 deal with a convex cone framework, i.e. a framework where the set of all available investments consists of a convex cone. A large class of imperfect market models, that we shall denote by I, can fit in this framework: models with imperfections on the num´eraire like no borrowing or different borrowing and lending rates, models with dividends, short-sale constraints, convex cone constraints, proportional transaction costs. 43

44

E. Jouini and C. Napp

Section 2 is devoted to the characterization of the no-free-lunch assumption first in a general convex cone framework with flows, then in all the models with imperfections belonging to I, and is taken from Jouini and Napp (2001) and Napp (2000). We consider first a quite general model; the investment opportunities are not specifically related to the buying and selling of securities on a financial market. The time horizon is not supposed to be finite. The framework is the one of continuous time. We don’t assume that there exists a num´eraire, enabling investors to transfer money from one date to another, and even if such possibilities exist, we do not assume that the lending rate is equal to the borrowing rate or that we have possibilities to borrow. It is proved that the absence of free lunch in a general convex cone framework with flows is essentially equivalent to the existence of a discount process such that the “net present value” of any investment opportunity is nonpositive. This result is then applied to obtain the Fundamental Theorem of Asset Pricing for all cases of market imperfections in I. In each case, we find that there is no free lunch if and only if a given specific convex set of discount processes is nonempty. For instance, in the case with short-sale constraints, we find that the absence of free lunch is equivalent to the existence of a discount process such that the discounted price process of any security that cannot be sold short (resp. that can only be sold short) is a supermartingale (resp. a submartingale). Section 3 is devoted to pricing issues first in a general convex cone framework with flows, then in all the models with imperfections belonging to I, and is taken from Napp (2000). Section 3.1 is in the spirit of Harrison and Kreps (1979); we generalize existing results by considering general investment flows, and by taking almost any kind of imperfection into account. We consider a “primitive” market, consisting of a certain set of investment opportunities and we want to give a price to an additional contingent flow by using arbitrage considerations. More precisely, we define an admissible price for an additional contingent flow " as a price which is compatible with the assumption of no-arbitrage (or no free lunch) in the “full” market consisting of our primitive market and ". For a general contingent flow, we obtain an interval of admissible prices, which is given by the “net present value” of the flow under all admissible discount processes. We then apply this result to obtain arbitrage intervals for the price of contingent claims in market models with frictions in I. Section 3.2 is devoted to the characterization of the obtained arbitrage bounds in terms of superreplication cost. We start by defining in a general model with flows the so-called superreplication cost, which essentially corresponds to the minimum initial wealth needed to cover all future contingent flows. We show that for any contingent flow, it is equal to the upper bound of the arbitrage interval. The notion of superreplication cost was first introduced by Kreps (1981), for classical contingent claims and in the context of incomplete markets (with no

2. Arbitrage Pricing with Frictions

45

other imperfection). In a diffusion framework, and still with no other imperfection than incompleteness, El Karoui and Quenez (1995) obtain a dual formulation for the superreplication price; in Delbaen (1992) and Delbaen and Schachermayer (1994), this result is obtained in a more general framework. In the spirit of Kreps (1981), Jouini and Kallal (1995a,b) take into account the cases of proportional transaction costs and short sale constraints. For transaction costs, the problem was first introduced by Bensa¨ıd et al. (1992), who show that in a binomial model with transaction costs, perfect replication is not optimal. Cvitani´c and Karatzas (1996) give, in a diffusion framework, a dual formulation for the superreplication price. Delbaen, Kabanov and Valkeila (2001) and Kabanov (1999) generalize this result to the multivariate case, in discrete as well as in continuous time, and with a semimartingale price process. For convex constraints, and still in a diffusion framework, the dual formulation is obtained in Cvitani´c and Karatzas (1993). In a more general framework, the result is obtained in F¨ollmer and Kramkov (1997). Section 4 deals with economies with fixed transactions costs, which do not fall in the preceding framework, since the set of all available investments is not a convex cone. It is adapted from Jouini, Kallal and Napp (2000). We first obtain a characterization of the no-free-lunch assumption in a general model with flows. We find that the assumption of no-arbitrage is essentially equivalent to the existence of a family of nonnegative “discount processes” such that the net present value of any available investment is nonnegative. Then we apply this result to a securities market model where investors are submitted to both fixed and proportional transaction costs. In that case, the nonnegative discount processes can be interpreted as absolutely continuous martingale measures. Finally, we study pricing issues in securities market models with fixed transaction costs. We adopt an axiomatic approach. We define admissible pricing rules on the set of attainable contingent claims as the price functionals that are arbitrage free and are lower than or equal to the superreplication cost. Indeed, no rational agent would pay more than its superreplication cost for a contingent claim since there is a cheaper way to achieve at least the same payoff using a trading strategy. We then show that the only admissible pricing rules on the set of attainable contingent claims are those that are equal to the sum of an expected value with respect to any absolutely continuous martingale measure and of a bounded fixed cost functional.

2 The Fundamental Theorem of Asset Pricing We start by describing our general model with flows in a convex cone framework, and in such a model, we characterize the assumption of no free lunch. Then we apply this result to all cases of market imperfections belonging to the class I.

46

E. Jouini and C. Napp

2.1 In a general “convex cone model” with flows We adopt the framework of Jouini and Napp (2001), Napp (2000) or Jouini et al. (2000, Section 1). We introduce a few notations. For a filtered probability space , F, (Ft )t≥0 , P , define the measure space ˆ µ ˆ F, ˆ is the , ˆ as the direct sum of the probability spaces (, Ft , P), i.e. disjoint union of continuum many copies (t )t≥0 of , Fˆ is the sigma-algebra ˆ ˆ ˆ of sets ˆ induces on such that A ∩ t ∈ Ft , for each t ≥ 0, and µ A ⊆ ˆ t the original probability measure P. We then may represent the each t , F| ˆ µ ˆ F, Banach space X ≡ L 1 , ˆ as the space of all families " = ("t )t≥0 such that "t ∈ L 1 (, Ft , P) and -"- L 1 (, -"t - L 1 (,Ft ,P) < ∞. ˆ µ) ˆ F, ˆ = t≥0

The finiteness of the above sum implies in particular that "t = 0 for all but countably many t in R+ . The dual space of X may be represented as Y ≡ ∞ ˆ ˆ L , F, µ ˆ , which is defined as the space of all families g = (gt )t≥0 such that gt ∈ L ∞ (, Ft , P) and -g- L ∞ (, ˆ µ) ˆ F, ˆ = sup -gt - L ∞ (,Ft ,P) < ∞. t≥0

The scalar product is defined by .", g/ X,Y = t≥0 ."t , gt /. Elements of X and Y are defined up to a modification. ˆ 0 , Fˆ0 , µ ˆ 0 , Fˆ0 , µ ˆ 0 , where ˆ 0 is the direct sum of the probability Let Xˇ ≡ L 1 ˆ 0 , Fˆ0 , µ ˆ 0 denotes the dual space of Xˇ . spaces (, (Ft )t>0 , P). Then Yˇ ≡ L ∞ For x, y ∈ X or Y (resp. Xˇ or Yˇ ), we write x ≥ y if for all t ≥ 0 (resp. t > 0), x t ≥ yt a.s. P. For all subset Z of X, Y, Xˇ or Yˇ , we denote by Z + (resp. Z − ) the set of x ∈ Z such that x ≥ 0 (resp. x ≤ 0). We consider a model in which agents face investment opportunities described by their cash flows. A probability space (, F, P) is specified and fixed. The set represents all possible states of the world. An information structure, which describes how information is revealed to investors, is given by a filtration (Ft )t≥0 satisfying the “usual conditions” and such that F0 = {∅, }. We consider investments of the following form: Definition 2.1 An investment is a process " = ("t )t≥0 ∈ X . For each t ≥ 0, the random variable "t corresponds to the cash flow generated at time t by the investment "; if "t (ω) = k, this means that the investor receives k at date t if k is nonnegative and pays −k at date t if k is nonpositive. An arbitrage opportunity is as usual a possibility to find an investment that yields a positive gain in some circumstances without a countervailing threat of loss in

2. Arbitrage Pricing with Frictions

47

other circumstances. In our framework, an arbitrage opportunity would consist of a nonnegative nonnull available investment. We consider a convex cone J of available investments: this amounts to saying that an investor has a right to subscribe to (a finite number of) different investment plans and that he can decide at the starting date of any investment opportunity which amount of this particular investment he wants to buy. We are led to consider convex cones in order to take into account the fact that investors are not necessarily able to sell an investment plan (consider for instance the case of short sale constraints or transaction costs). In order to obtain the Fundamental Theorem of Asset Pricing in this context, we make the additional assumption that there is in the convex cone J some possibility of transferring some money. More precisely, we introduce the following assumption. Assumption A: there exists a sequence d = (dn )n≥0 such that for all t ∗ ≥ 0, for all Bt ∗ in Ft ∗ of positive probability, there exists " in J such that "t = 0 ∀t < t ∗ , "t ∗ = 0 outside Bt ∗ , "t ≥ 0 ∀t > t ∗ and ∃dn ∈ d, P "dn > 0 > 0. In words, this means that there exists a sequence of trading dates such that, for every date and for every event at that date, there exists an investment plan in our set of available investments that starts at that date and in that event, that can take any value at that date and in that event, but that is nonnegative after that date and nonnull at one date belonging to the above mentioned sequence of dates. This assumption is not too restrictive. See Jouini and Napp (2001) for more details on this assumption. We don’t specify the elements of J so far. The assumption of no-arbitrage for J can be written J ∩ X + = {0} or equivalently (J − X + ) ∩ X + = {0}. A free lunch denoting the possibility of getting arbitrarily close to an arbitrage opportunity, we introduce the following definition. Definition 2.2 There is no free lunch for J if and only if J − X + ∩ X + = {0}, where the bar denotes the closure for the norm topology in X. We now characterize the absence of free lunch. Notice that since we do not necessarily have the opportunity to transfer money from one time to another, we cannot consider “net gains” anymore, and we have to get an analog of the Kreps–Yan theorem (Yan (1980), Kreps (1981)) in a more complex space than the classical L 1 (, F, P) for a probability (or sigma-finite measure) space (, F, P). In our general context with investments in X , we obtain the following Fundamental Theorem of Asset Pricing. Theorem 2.3 Under Assumption A, there is no free lunch for J if and only if there exists a positive process g = (gt )t≥0 in Y such that g| J ≤ 0.

48

E. Jouini and C. Napp

Note that positive means here that g seen as a linear functional on X is positive, or equivalently that for all t, gt > 0 a.s. P. Since for all " ∈ J , .", g/ X,Y = E t≥0 gt "t , Theorem 2.3 means that the absence of free lunch (for J ) is essentially equivalent to the existence of a discount process under which the “net present value” of any available investment (in J ) is nonpositive. We shall denote by G J the set of all “admissible discount processes”, i.e. G J ≡ {g ∈ Y , g > 0, g| J ≤ 0}. If there is no free lunch, then according to Theorem 2.3, G J is non-void.

2.2 Application to the characterization of the no-free-lunch assumption in all cases of market imperfections in I As our investment opportunities are supposed to be very general, it is shown in Jouini and Napp (1998) that most market models involving imperfections can fit in the model for a specific convex cone of investments J satisfying Assumption A. This is the case for the following set (that we shall denote by I) of imperfect market models: models with imperfections concerning the num´eraire (no borrowing, different borrowing and lending rates), models with dividends, short-sale constraints, convex cone constraints, proportional transaction costs. Let us see how for instance Theorem 2.3, obtained in a general setting, can be applied to the case of short sale constraints. As in Jouini and Kallal (1995b), we consider a model of financial market where two sorts of securities can be traded. Short selling the first type of securities is not allowed, i.e. they can only be held in nonnegative amounts, whereas the second type of securities can only be held in nonpositive amounts. The model includes situations where holding negative amounts of a security is possible but costly as well as situations where some (or all) securities are not subject to any constraints, since we may include a security twice in the model, in the first and in the second set of securities. For 1 ≤ k ≤ n (resp. n + 1 ≤ k ≤ N ), we denote by S k the price process of the security k that can only be held in nonnegative (resp. nonpositive) amounts. We assume that for k ∈ {1, . . . , N }, Stk belongs to L 1 (, Ft , P) for all t, and that S 1 ≡ 1 (i.e. there are lending opportunities). For all t1 ≤ t2 , for all bounded nonnegative Ft1 -measurable real-valued random variables θ , we let "(k;θ,t1 ,t2 ) denote the process given by "t(k;θ ,t1 ,t2 ) = −θ Stk1 1t=t1 + θ Stk2 1t=t2 1 ,t2 ) for 1 ≤ k ≤ n and "(k;θ,t = θ Stk1 1t=t1 − θ Stk2 1t=t2 for n + 1 ≤ k ≤ N . We t assume that the set JS is the convex cone generated by all these investments. Then JS satisfies Assumption A and by an immediate application of Theorem 2.3, we get that there is no free lunch for JS , or equivalently that there is no free lunch in a model with short sale constraints, if and only if the set G JS is nonempty, where G JS denotes the set of positive processes g ∈ Y such that for all securities k that cannot

2. Arbitrage Pricing with Frictions

49

be sold short (i.e. k ≤ n), gS k is a supermartingale and for all securities k that can only be sold short (i.e. n + 1 ≤ k), gS k is a submartingale. We adopt in Jouini and Napp (2001) a similar approach for all other market imperfections in I. Each time, we introduce a specific set of available investments corresponding to the considered imperfection, we apply Theorem 2.3 and obtain more or less directly a specific characterization of the no-free-lunch condition in these imperfect market models. In each case, we find that there is no free lunch if and only if a given specific convex set of discount processes1 is nonempty.

2.3 A few remarks and extensions • In Jouini et al. (2000), we adopt a new topology on X for the definition of a free lunch. The idea is to weaken the topology on X ; to motivate this idea, F, µ) so that its dual recall that we have considered the norm topology on L 1 (, ∞ F, equals L (, F, µ). Considering the elements g = (gt )t∈R+ ∈ L ∞ (, µ) as functions on × R+ note that, for fixed ω ∈ , the function t → gt (ω) does not obey any continuity or measurability requirements (apart from being uniformly F, µ) seems too big for a useful economic bounded). The space Y = L ∞ (, interpretation and should be replaced by a space Y of more regular processes, e.g., the adapted bounded processes (yt )t∈R+ which almost surely have c`ad (right continuous) or c`ag (left continuous) or continuous trajectories. This leads us F, to consider the space X = L 1 (, µ) in duality with the space Y proposed above and to equip X with a topology τ compatible with the dual pair .X, Y /. We prove in Jouini et al. (2000) that in this setting we do have a positive result of Yan type, hence a characterization of the no-free-lunch assumption, without Assumption A; more precisely, we prove that for all closed convex cones in X such that C ⊇ X − , if C ∩ X + = {0}, then we can find a strictly positive linear functional y ∈ Y++ , such that y|C ≤ 0. • Still in Jouini et al. (2000), we generalize the framework of Section 2.1, by considering a space of investments given by a space of measures. More precisely, we take X given by M (R+ × , O), the space of equivalence classes of finite measures µ on the optional sigma-algebra O, modulo the measures supported by evanescent sets. Note that this enables us to model in X continuous time payment streams (which may or may not be absolutely continuous with respect to Lebesgue-measure). We obtain a characterization of the no-free-lunch assumption in such a context. • We study in Napp (2000) the links between the extremality or the uniqueness of the “admissible discount process” given by the absence of free lunch and the 1 See Section 4 for a description of this set in the transaction costs case.

50

E. Jouini and C. Napp

completeness of the market, in the case where the convex cone J of available investments is a linear subspace of X . Similar results have been obtained in Jacod (1979), Harrison and Pliska (1981), Delbaen (1992) and Delbaen and Schachermayer (1994).

3 Arbitrage intervals and superreplication cost Now that we have characterized the absence of free lunch, we shall turn to pricing issues, still in the framework of Section 2.

3.1 Arbitrage intervals We start with the general framework with a convex cone of available flows. We adopt the approach of Harrison and Kreps (1979). We assume that we are faced with a so-called primitive financial market consisting of a convex cone C of available investment opportunities satisfying Assumption A. We suppose that there is no free lunch in the primitive market or equivalently that there is no free lunch for C, so that according to Theorem 2.3, the set G C is nonempty. In addition to this primitive market, we consider a contingent flow in the form of some process ˇ = ("t )t>0 ∈ Xˇ . The aim of this subsection is to give a “fair” price to this " additional contingent flow by only using arbitrage considerations. ˇ ∈ Xˇ if there is no free lunch in We say that (−"0 ) is a fair (buying) price for " the so-called full market consisting of the convex cone C generated in X by C and " ≡ ("t )t≥0 . These values of (−"0 ) can be seen as the price to pay at date 0 in order to have access to the flows "t at each date t > 0, in a way that iscompatible ˇ gt ˇ ∈ Xˇ , let l"ˇ ≡ infg∈G C ", with the no-free-lunch condition. For all " g0 t>0 ˇ ˇ X ,Y g t ˇ and u "ˇ ≡ supg∈G C ", . For simplicity of notation, we shall indifferg 0 t>0 Xˇ ,Yˇ ˇ gt ˇ g ently write ", or ", . g0 g0 t>0 Xˇ ,Yˇ

Xˇ ,Yˇ

C ˇ Lemma 3.1 A price (−"0 ) is a fair price for " if and only if there exists g ∈ G , ˇ g . Any fair price (−"0 ) satisfies (−"0 ) ≥ l"ˇ . Conversely, any (−"0 ) ≥ ", g0 Xˇ ,Yˇ

ˇ price (−"0 ) > l"ˇ is a fair price for ". We have obtained a lower bound on the value of any fair (buying) price. Any fair buying price for a contingent flow is a price that is greater than or equal to the net present value of the flow with respect to some admissible discount process. In ˇ ˇ a natural way, a fair selling price for " ∈ X is the opposite of a fair (buying) price ˇ ≡ −" ˇt ˇ we get that any fair selling for −" . By applying Lemma 3.1 to −", t>0

2. Arbitrage Pricing with Frictions

51

ˇ satisfies (−")0 ≤ u "ˇ and that, conversely, any price (−")0 < u "ˇ is a price for " ˇ Notice that if " ˇ can be bought and sold, then by arbitrage fair selling price for ". considerations, its buying price necessarily lies above its selling price. ˇ ∈ Xˇ if there is no free We say that (−"0 ) is a fair buying–selling price for " lunch in the market consisting of the convex cone generated in X by C, " and −". ˇ can be bought and sold without generating It corresponds to the price at which " any free lunch. ˇ Corollary 3.2 A price (−" 0 ) isa fair buying–selling price for " if and only if there g C ˇ exists g ∈ G , (−"0 ) = ", . Any fair buying–selling price (−"0 ) belongs g0 Xˇ ,Yˇ to l "ˇ , u "ˇ . Conversely, if l"ˇ = u "ˇ , then there is a unique fair buying–selling price equal to l"ˇ , and if l"ˇ < u "ˇ , then any price (−"0 ) ∈ l"ˇ , u "ˇ is a fair ˇ buying–selling price for ". If G C is reduced to a singleton, then there exists a unique fair buying–selling ˇ ∈ Xˇ . If G C is not reduced to a singleton, we only obtain arbitrage price for any " intervals for the price of contingent flows. For any contingent flow which can be bought and sold, its arbitrage interval consists of its net present value under all admissible discount processes in G C . We can now apply these results for the pricing of contingent claims in any market ∗ model in I. Let T ∈ R+ . A contingent claim will denote any random variable H 1 in L (, FT , P), corresponding to the payoff at date2 T . We want to give a fair price to a contingent claim H by only using arbitrage considerations. We still assume that we are faced with a so-called primitive financial market consisting of a convex cone C of available investment opportunities satisfying Assumption A and such that the set G C is nonempty. In addition to this primitive market, we assume that investors have access to the contingent claim H so that the set of all available investment opportunities consists of the convex cone C generated by C and the contingent " H ∈ X given by "TH = H and "tH = 0 for all t ∈ / {0, T }. We flow H H say that −"0 is a fair (buying) price for H if it is a fair price for "t t>0 ∈ Xˇ . By applying Lemma 3.1 to the investment opportunity "tH t>0 in Xˇ , we immediately get the following result. H Corollary 3.3 Any fair buying price −" for a contingent claim H satisfies 0 g ≤ −"0H ≥ infg∈G C E gT0 H . Any fair selling price for H satisfies "−H 0 2 Notice that contingent claims whose payoffs belong to Xˇ , without necessarily being related to a unique date

T , also fall in our framework.

52

E. Jouini and C. Napp

supg∈G C E ggT0 H . If H can be bought and sold at the same price, then −"0H ∈ infg∈G C E ggT0 H , supg∈G C E ggT0 H . We are now able to use the specific characterization of the set G C obtained in the different imperfect market models in I (see Jouini and Napp (2001)) to obtain in each case specific arbitrage bounds. We state the result with short sale constraints, i.e. in the case where, with the notations of Section 2, C is given by JS . Corollary 3.4 If there are short sale constraints, the buying price for any contingT gent claim H is greater than or equal to infg∈G JS E g0 H , and if there is a selling price for H , it is smaller than or equal to supg∈G JS E ggT0 H . We shall now pin down these arbitrage intervals, through the use of the superreplication cost. 3.2 Arbitrage bounds and superreplication cost The aim of this subsection is to show that the upper bound of the arbitrage interval, in a general context with flows as well as in market models with frictions in I, is given by the so-called superreplication cost; for a contingent flow x ∈ Xˇ , this cost corresponds to the minimum initial wealth needed to obtain, through available investments, at least as much as the flow x. This notion was originally introduced by Kreps (1981) for classical contingent claims in the context of incomplete markets (with no other imperfection). All available investments still consist of a convex cone and we consider the set M of contingent flows in Xˇ that agents can “dominate” by using available investment opportunities, M ≡ x ∈ Xˇ , ∃" ∈ J, "t ≥ xt ∀t > 0 . In words, M is the set of flows m for which there exists an available investment (in J ), which is unambiguously better than m after the initial date. We now introduce on M the notion of superreplication cost. Definition 3.5 For all m ∈ M, the superreplication cost of m is denoted by π¯ (m) and given by π¯ (m) ≡ inf lim inf −"n0 ; "nt ≥ m nt ∀t > 0, "n , m n ∈ J × M, m n → Xˇ m . The superreplication cost represents the infimum wealth necessary to subscribe to an investment opportunity which will provide us with at least as much as a flow arbitrarily close to m. Like in Jouini and Kallal (1995a) for the case of proportional transaction costs, we start by describing the set M and the functional π. ¯

2. Arbitrage Pricing with Frictions

53

Lemma 3.6 The set M is a convex cone. If there is no free lunch for J , the price functional π¯ is a sublinear3 lower semi continuous4 functional which takes values in R. We are now in a position to obtain a dual representation formula for the upper bound of the arbitrage intervals. Proposition 3.7 If there is no free lunch for J , then for all m ∈ M, π¯ (m) = g ˇ supg∈G J ", . g0 Xˇ ,Yˇ

This means that the superreplication cost of a contingent flow is equal to the supremum of its expected value with respect to all admissible discount processes, which coincides with the upper bound of the arbitrage interval. If we now consider some m ∈ M such that −m ∈ M, a symmetric yields argument g g , or [−π¯ (−m) , π¯ (m)] = cl g0 , m ; g ∈ G J , −π¯ (−m) = infg∈G J m, g0 Xˇ ,Yˇ

so that the bounds of the arbitrage intervals, in the general context with flows as well as for contingent claims in imperfect market models (belonging to I), are completely characterized in terms of superreplication cost. Note that for some authors, the “true” superreplication cost is given on M by π (m) = inf {(−"0 ) ; " ∈ J, "t ≥ m t ∀t > 0}. It is proved in Napp (2000) that under the assumption of no-free-lunch, π¯ is the largest lower semi continuous functional lying below π . Besides, we investigate when the upper bound of the arbitrage interval is effectively given by the “true” superreplication price π , in other words, when π¯ = π. We get the equality when π is l.s.c. or each time that for every scalar λ, the set of contingent flows that can be dominated by an available investment opportunity with initial value smaller than or equal to λ is closed. More generally, we consider some specific market models in I for which more simple expressions for π¯ can be obtained: discrete models as well as models with short sale constraints and imperfections on the num´eraire if we assume that asset prices are continuous. Notice however that the approach with π¯ has enabled us to characterize the arbitrage bounds in a general framework.

3.3 A few remarks and extensions In Napp (2000), we adopt an axiomatic approach. Like in Harrison and Pliska (1981) and more recently Jouini (2000) for the case of proportional transaction 3 That is, for all m , m in M and all λ ∈ R , we have π¯ (m + m ) ≤ π¯ (m ) + π¯ (m ) and π¯ (λm ) = + 1 2 1 2 1 2 1 λπ¯ (m 1 ). 4 That is, such that {(m, λ) ∈ M × R; π¯ (m) ≤ λ} is closed in M × R, or equivalently such that

{m ∈ M; π¯ (m) ≤ λ} is closed in M for all λ ∈ R, or equivalently such that lim infn {π¯ (m n )} ≥ π¯ (m) whenever the sequence (m n ) ⊂ M converges to m ∈ M.

54

E. Jouini and C. Napp

costs, and Koehl and Pham (2000) for convex constraints, we start from a certain number of axioms that a price functional, defined on the set of contingent flows, must satisfy in order to be admissible. These axioms are linked not only to arbitrage but also equilibrium considerations. We obtain a dual characterization of all admissible functionals. A similar axiomatic approach will be adopted in Section 4 for models with fixed transaction costs. We also study issues related to the viability (a notion introduced by Harrison and Kreps (1979)), or equivalently to the compatibility with an equilibrium, of the pricing rules we have found. We emphasize that all results obtained for a general contingent flow can be applied to contingent claims in securities market models with frictions belonging to I.

4 Models with fixed transaction costs We consider in this section financial models where the available investment flows are subject to fixed transaction costs.

4.1 The characterization of the no-free-lunch assumption in a general model with fixed costs We introduce a few notations. We denote by S f the collection of stopping times of (Ft )t≥0 taking a finite number of values in R+ . For any τ ∈ S f , we denote by Sτf the class of stopping times ν in S f with τ ≤ ν a.s. Definition 4.1 An investment consists of 1. an initial stopping time τ in S f 2. a starting event B in Fτ 3. an (Ft )t≥0 -adapted process " = ("t )t≥0 such that " is null outside B, and " f there exists a finiteset of stopping times τ = τ " 1 ≤ . . . ≤ τ N" in Sτ for which "t = 0 for all t ∈ / τ l" l∈ {1,...,N" } and for all l, "τ l" ∈ L 1 , Fτ l" , P . We shall call the process " the investment process. The starting stopping time and event can correspond to the stopping time and event at which one investor may subscribe to the investment opportunity. The investment process corresponds to the associated cash flow. We still consider a convex cone I of available investment processes and for all pairs (τ , B) ∈ S f × Fτ , we let I τ ,B (resp. J τ ,B ) denote the set of all available investment processes associated with investments with starting stopping time τ I ν,B ). and starting event B (resp. starting after τ and B, i.e. J τ ,B = ∪ ν≥τ B ⊆B

2. Arbitrage Pricing with Frictions

55

We assume that we can transfer wealth from one date to another,i.e. that, for all stopping times τ 1 , τ 2 in S f and for all random variables θ in L 1 , Fτ 1 ∧τ 2 , P , ,τ 1 ,τ 2 ) = −θ1t=τ 1 + θ1t=τ 2 the process denoted by "(0;θ,τ 1 ,τ 2 ) and given by "(0;θ t with starting stopping time τ 1 ∧ τ 2 and starting event equal to {θ = 0} belongs to the set I of all available investment processes. We shall denote by the set of such transfers, i.e. the convex cone generated by all these investment processes. We assume that it is not costless to subscribe to an investment, i.e. that there are “fixed costs” associated with any investment plan. More precisely, we as(τ ,B,") = sociate with each investment (τ , B, ") a nonnegative cost process c (τ ,B,") " ; when there is no ambiguity, we shall sometimes write c instead ct t≥0

of c(τ ,B,") . The assumptions we make on the fixed costs are the following: we assume first that the cost process is (Ft )t≥0 -adapted, which means that investors know at time t the past and current values of the fixed cost but nothing more. We assume that the cost process c(τ ,B,") is null before the stopping time τ , outside the event B, and outside a finite number of stopping times in S f . Besides, we assume that there is no fixed cost associated with the transferring of wealth from one date to another, i.e. for all " ∈ I, for all % ∈ , we have c" = c"+% . Moreover, the total cost associated with any investment opportunity is supposed to be bounded, i.e. there exists a positive real number C such that t≥0 ct" ≤ C for all " ∈ I, which can be interpreted as the investors’ refusal to pay more than a certain given amount for fixed costs: this explains why we call these costs fixed costs as opposed to proportional costs. Finally, the fixed costs incurred at the initial stopping time must be “positive”, i.e. for all (τ , B) ∈ S f × Fτ , there exists a positive real number / satisfy cτ" ≥ ετ ,B on ετ ,B , such that all investment processes " ∈ I τ ,B with " ∈ B. According to these assumptions, the fixed costs can be interpreted as information costs, opportunity costs, time costs, etc. In a financial market model, they can correspond to fixed brokerage fees. They can account for a sort of cost of accessing5 the available investments or more generally for frictions of all kinds. As usual, an arbitrage opportunity is an investment plan that yields a positive gain in some circumstance, without a countervailing threat of loss in other circumstances and a free lunch is a possibility of getting arbitrarily close to an arbitrage opportunity. Definition 4.2 An arbitrage opportunity is an available investment (τ , B, ") with " in I such that "t − ct" ≥ 0 for all t ≥ 0, and there exists a date for which it is nonnull. 5 This “cost of accessing the investment opportunities” can be understood in a general sense: it can be a fee

(such as an investment tax), or the cost of setting up an office.

56

E. Jouini and C. Napp

For all pairs (τ , B) ∈ S f × Fτ , we let Aτ ,B denote the set of all nonnegative investment processes u such that u τ > εu on B for some positive constant εu and we obtain the following characterization of the absence of arbitrage opportunity in our model. Lemma 4.3 There is no arbitrage opportunity if and only if for all (τ , B) ∈ S f × Fτ , we have I τ ,B ∩ Aτ ,B = ∅. Using the same notations as for the definition of an arbitrage opportunity, we now introduce the notion of free lunch. We shall consider the set I as a subset of 1 ˆ ˆ L , F, µ ˆ , considered in Section 2.1, and adopt the norm topology on this space. Definition 4.4 There is a free lunch if and only if there exist a pair (τ , B) ∈ S f ×Fτ ˆ µ ˆ F, ˆ ∩ Aτ ,B = ∅, where the bar denotes the closure in for which I τ ,B − L 1+ , ˆ µ ˆ F, ˆ . L 1 , See Jouini, Kallal and Napp (2000) for an interpretation of the definition of a free lunch in a securities market model with fixed transaction costs. Notice that the assumption of no-free-lunch in such a model is less restrictive than in the withoutfixed-cost otherwise identical model. We now obtain the main result. Theorem 4.5 There is no free lunch if and only if for all (τ , B) ∈ S f × Fτ , there exists an absolutely continuous probability measure P τ ,B with bounded density such that P τ ,B (B) = 1 and for every investment process " in J τ ,B , τ ,B EP t≥0 "t ≤ 0. This means that the absence of free lunch in our model with fixed trading costs is equivalent to the existence of a family of absolutely continuous probability measures under which the net present value of any available investment is nonpositive.

4.2 Application to securities market models with both fixed and proportional costs We consider an economy where agents can trade a finite number of securities and we assume that these securities are subject to bid–ask spreads: at each date, there is not a unique price for a security but an ask price, at which investors can buy the security and a bid price, at which they can sell the security. Notice that this model includes situations where there is a unique price process Z and where the proportional transaction cost remains constant over time, i.e. situations where at each time t, investors must pay Z t (1 + c) for some positive constant c to buy the security and receive Z t (1 − c) when selling it.

2. Arbitrage Pricing with Frictions

57

More precisely, we consider (n + 1) securities and for each security k for k 0 ≤ k ≤ n, we let Z t t≥0 and Z tk t≥0 denote respectively the ask and bid price process. We assume that the (n + 1)-dimensional processes Z and Z are right-continuous and of class D f , i.e. that the families {Z τ }τ ∈S f and Z τ τ ∈S f are uniformly integrable. For each k in {0, . . . , n}, for all stopping times τ 1 and τ 2 in S f , for all nonnegative real-valued bounded random variables θ in Fτ 1 ∧τ 2 , we let "(k;θ ,τ 1 ,τ 2 ) denote the process given by 1 ,τ 2 ) = θ −Z τk 1 1t=τ 1 + Z τk2 1t=τ 2 "(k;θ,τ t and we assume that the set I of all available investment processes consists of the convex cone generated by all the processes "(k;θ ,τ 1 ,τ 2 ) . This means that all available investment opportunities are related to the buying and selling of the (n + 1) securities, at some stopping times and in random quantities. We still assume that we can transfer wealth without friction, i.e. we set for all t, Z t0 = Z t0 = 1. Like in the previous section, we assume that there are fixed costs associated with these investment opportunities. The assumptions made on the fixed costs remain the same as above but their interpretation in this specific setting can be made more accurately. First, if an investor doesn’t trade in the risky securities at time t, then he doesn’t pay any additional cost; but in order to buy at stopping time τ a “portfolio” &τ , he must pay &τ · Z τ + cτ& , where cτ& denotes the fixed cost to be paid by the investor at stopping time τ when following the strategy &. The fixed cost can depend upon the strategy followed by the investors: for instance at the same date and event, it can be different according to what the investor has done before that date and event; this means equivalently that the fixed costs to be paid are not necessarily the same for all investors. Second, the aggregated fixed costs are bounded independently of the chosen strategy and independently of the considered investor, or in other words we assume that there exists a positive real number C such that for all strategies &, t≥0 ct& ≤ C. This means in particular that the fixed costs to be paid at some date t are bounded independently of the amount traded, which explains why we call them fixed costs as opposed to proportional costs. Finally, we assume that at the first time an investor trades, he incurs a positive fixed cost, which is to be interpreted as a cost of accessing the market. We get the following characterization of the absence of free lunch in a model with proportional and fixed transaction costs. Theorem 4.6 There is no free lunch in our model with fixed and proportional transaction costs if and only if for all (τ , B) ∈ S f × Fτ , there exists an absolutely

58

E. Jouini and C. Napp

continuous probability measure P τ ,B with bounded density such that P τ ,B (B) = 1 and some process S τ ,B satisfying Z t 1 B∩{τ ≤t} ≤ Stτ ,B 1 B∩{τ ≤t} ≤ Z t 1 B∩{τ ≤t} τ ,B τ ,B τ ,B EP for t ≥ s. St∨τ | Fs∨τ = Ss∨τ This means that for all (τ , B) ∈ S f × Fτ there exists an absolutely continuous probability measure P τ ,B that transforms some price process S τ ,B lying after τ and on B between the discounted bid and ask price processes into a martingale from the stopping time τ and event B. In the case where there is no proportional transaction cost, i.e. if Z = Z , we find that the absence of free lunch in a securities market model with fixed transaction costs is equivalent to the existence of a family of absolutely continuous martingale measures. Our characterization of the no-freelunch assumption is then weaker than the classical one, and leads to a larger class of arbitrage-free models.

4.3 Pricing issues in securities market models with fixed transaction costs The framework is the same as in the previous section except that in order to concentrate on the fixed costs, we assume that Z = Z , in other words there is no proportional transaction cost. As in Section 3, we consider a finite time horizon T , and a contingent claim H to consumption at the terminal date T is a random variable belonging to L 1 (, FT , P) . A contingent claim H is said to be attainable (in the model without fixed cost) if there exists some available investment process " in I 0, such that "t = 0 for all t ∈ ]0, T [ and "T = H. Note that the set M of all attainable contingent claims is a linear space. We shall now define and characterize pricing rules p on M that are admissible. As in Section 3, we introduce the definition of the superreplication price of H , π c (H ), in our framework with fixed costs π c (H ) ≡ inf −"0 + c0" , " ∈ I 0, , "t − ct" ≥ 0 for all t ∈ ]0, T [ , "T ≥ H + cT" Definition 4.7 An admissible pricing rule on M is a functional p defined on M, such that 1. p induces no arbitrage, i.e., it is not possible to find processes "1 , . . . , "n in n I 0, , such that "it = 0 for all t ∈ ]0, T [ and for which i=1 p "iT ≤ 0, n i i=1 "T ≥ 0 and one of the two is nonnull. 2. p (H ) ≤ π c (H ).

2. Arbitrage Pricing with Frictions

59

Part 1 is the usual no-arbitrage condition. Part 2 says that an admissible price for the contingent claim H must be smaller than its superreplication price: if it is possible to obtain a payoff at least equal to H at a cost π c (H ), then no rational agent (who prefers more to less) will accept to pay more than π c (H ) for the contingent claim H. The following proposition characterizes the admissible pricing rules on M through the use of the absolutely continuous martingale measures obtained in Theorem 4.6. Proposition 4.8 Under the assumption of no-free-lunch, any admissible pricing rule p on M can be written as ∗

p(H ) = E P [H ] + c(H )

for all H in M

where P ∗ is any absolutely continuous martingale measure and c is a bounded functional defined on M. If we assume that for a large enough scalar λ, we have p (λx) < λ [ p (x)], then the fixed cost functional is nonnegative; moreover, if we assume that there exists ε > 0, such that for a large enough λ, p (λx) < λ [ p (x) − ε], then the fixed cost is greater than or equal to this positive constant ε. ∗ Notice that Proposition 4.8 implies that p(λH )/λ →λ→∞ E P [H ] for any attainable contingent claim H, where P ∗ is any absolutely continuous martingale measure. This means that the unit price of any attainable contingent claim H is ∗ equal to E P [H ] in the limit of large quantities. In particular, in a Black–Scholeslike model with fixed costs, the unique asymptotic price for any contingent claim is given by the usual Black–Scholes price. Appendix A Proof of Theorem 2.3 The proof is adapted from Yan (1980). It is very similar to the one in Jouini and Napp (2001), where Assumption A is also made. Let x ∈ J − X + ∩ X + , x = limn x n , where for all n, xn ≤ "n , "n ∈ J . Then, since g is nonnegative and g| J ≤ 0, for all n, .x n , g/ X,Y ≤ ."n , g/ X,Y ≤ 0. This implies .x, g/ X,Y ≤ 0, hence x = 0. Conversely, if J − X + ∩ X + = {0}, then for all x = 0, belonging to X + , the Hahn–Banach Separation Theorem yields the existence of g = 0, belonging to Y such that g| J −X + ≤ 0 < .x, g/ X,Y . It is easy to check that g is nonnegative. Let G J denote the nonempty set of all nonnegative g ∈ Y , g| J ≤ 0. We start by proving that for all dates t, there exists a process g t ∈ G J , such that gtt > 0 P a.s. Let S t be the family of equivalence classes of subsets of formed

60

E. Jouini and C. Napp

by the supports of the gt for all g in G J . By applying the Separation Theorem to the element x of X + such that x t = 1, xs = 0, ∀s = t, we get that the family S t is not reduced to the empty set. It is easy to see that the family S t is closed under countable unions. Hence there is gt in G J such that S t ≡ gtt > 0 satisfies P S t = sup P (S) ; S ∈ S t . We necessarily have P S t = 1; indeed, if P S t < 1, then we can apply the Separation Theorem to x such that xt = 1(−S t ) , x s = 0, ∀s = t and get the existence of g t ∈ G J , x, g t X,Y > 0. Then gtt + gtt > 0 would be an element of S t , with P-measure strictly greater than S t : a contradiction. Now we show that there exists g ∈ G J such that gdn > 0 almost surely for all dn ∈ d, where d is the sequence introduced in Assumption A. We consider the process g such that for all t ≥ 0, gt= n≥0 an gtdn , where (an )n≥0 is a sequence of positive scalars such that n≥0 an g dn Y < ∞. We find that g belongs to G J and satisfies gdn > 0 almost surely for all dn ∈ d. It remains to show that for all t, gt > 0 P a.s. Assume that for some T outside the set of dates {dn ; n ∈ N } we have just considered, the event BT ≡ {gT = 0} has positive P-probability; according to Assumption A, we know that there exists " ∈ J such that "T = 0 outside BT , "t = 0 ∀t < T , "t ≥ 0 ∀t > T and ∃dn ∈ d, P "dn > 0 > 0. For this particular investment " ∈ J , we would have .", g/ X,Y ≥ E "dn gdn > 0: a contradiction. Proof of Lemma 3.1 Since C satisfies Assumption A, and C is the convex cone ˇ generated in X by C and " ≡ ("t )t≥0 , a price (−"0 ) is a fair price for " if C and only if there exists g ∈ G satisfying E t≥0 gt "t ≤ 0 or, using the strict g ˇ . positivity of g, (−"0 ) ≥ ", g0 Xˇ ,Yˇ

1 ˇ g1 Proof of Corollary 3.2 Since gg0 , g ∈ G C is a convex set, if ", ≤ −"0 ≤ g0 Xˇ ,Yˇ 2 ˇ g2 for g 1 , g 2 ∈ G C , then there exists g ∈ G C , g0 = 1, such that −"0 = ", g0 Xˇ ,Yˇ ˇ g ", . g0 Xˇ ,Yˇ

Proof of Corollary 3.3 Immediate using Lemma 3.1. Proof of Corollary 3.4 Immediate applying Corollary 3.3. Proof of Lemma 3.6 The proof is adapted from Kreps (1981) and Jouini and Kallal (1995a). We shall repeatedly use the fact (F) that by a standard diagonalization

2. Arbitrage Pricing with Frictions

61

procedure, there exists a sequence ("n , m n ) , "n ≥ m n → Xˇ m, for which π¯ (m) = limn −"n0 . By definition, for all m ∈ M, π ¯ (m) < ∞. If there is no free lunch, for all g J g ∈ G , we have π¯ (m) ≥ m, g0 for all m ∈ M; indeed, assume that there Xˇ ,Yˇ

n n n exists a sequence (" , m ) in Jˇ × M such that "t ≥ m t ∀t > 0, m → Xˇ m, then for all g ∈ G J , −"n0 ≥ m n , gg0 →n m, gg0 , so that using (F), π¯ (m) ≥ ˇ ,Yˇ X Xˇ ,Yˇ m, gg0 . In particular, this implies that for all m ∈ M, π¯ (m) > −∞ and for all n

n

Xˇ ,Yˇ

m= 0 belonging to Xˇ + ∩ M, π¯ (m) > 0. Since J is a convex cone, it is easy to see that M is also a convex cone. Using ∗ , we (F), it is immediate that π¯ is such that for all m 1 , m 2 in M and all λ ∈ R+ of have π¯ (m 1 + m 2 ) ≤ π¯ (m 1 ) + π¯ (m 2 ) and π¯ (λm 1 ) = λπ¯ (m 1 ). By definition g J π¯ , we have π¯ (0) ≤ 0; we have seen that for all g ∈ G , π¯ (m) ≥ m, g0 for all Xˇ ,Yˇ

m ∈ M, thus π¯ (0) = 0. Let us show that π¯ is l.s.c. Let λ ∈ R and (m n ) be a sequence in M converging to m ∈ M such that π¯ (m n ) ≤ λ for all n ≥ 0. Then, using (F), for all n ≥ 0, there exists ("n , m ∗n ) in J × M, such that -m n − m ∗n - Xˇ ≤ 1/n, "nt ≥ m ∗n t ∀t > 0 and −"n0 ≤ λ + 1/n. Since m ∗n converges to m, we must then have π¯ (m) ≤ λ and the set {m ∈ M; π¯ (m) ≤ λ} is closed. Proof of Proposition 3.7 We show that (M, π) ¯ satisfies the assumptions of Corollary B.2 in Appendix B. If there is no free lunch, π¯ is an l.s.c. functional on the convex cone M (Lemma 3.6). By definition of M and π¯ , we have Xˇ − ⊆ M and π¯ ≤ 0 on Xˇ − . Since there is no free lunch for J , G J = ∅ and for all , hence there exists a positive continuous linear g ∈ G J , π¯ (m) ≥ m, gg0 Xˇ ,Yˇ

functional on Xˇ , whose restriction to M lies below ¯ We can apply Corollary B.2, π. and we obtain that for all m ∈ M, π¯ (m) = sup l (m) , l ∈ Yˇ , l > 0, l| M ≤ π¯ . It ˇ is then easy to verify that a positive l ∈ Y satisfies l| M ≤ π¯ if and only if it is if the for some g ∈ G J . Indeed, we have seen in the proof of Lemma form l = gg0t t>0

¯ conversely, if l| M ≤ π, ¯ then for all 3.6 that any g ∈ G J , g0 = 1 satisfies g| M ≤ π; " ∈ J, E t>0 l t "t ≤ −"0 and letting l0 = 1, (l t )t≥0 | J ≤ 0. Proof of Lemma 4.3 If there is an arbitrage opportunity, then there exists an available investment (τ , B, ") for which "t − ct" ≥ 0 for all t ≥ 0, hence "τ ≥ cτ" ≥ ε τ ,B on B and "t ≥ 0 for all t ≥ 0, so that " ∈ I τ ,B ∩ Aτ ,B . Conversely, suppose that there exists " ∈ I τ ,B ∩ Aτ ,B . Then there exists ε" ∈ ∗ R+ such that "τ ≥ ε " . The investment process λ" with λ such that λε " ≥

62

E. Jouini and C. Napp

C enables us to get enough at the initial stopping time to cover, through wealth transfer, present and future transaction costs. Proof of Theorem 4.5 Using Lemma 4.3, it is easy to see that there is no free lunch and only if for all (τ , B) ∈S f × Fτ , K τ ,B − L 1+ ∩ A B = ∅, where K τ ,B ≡ if τ ,B , A B ≡ f ∈ L 1 ; ∃ε > 0, f ≥ ε on B and the bar denotes t≥0 "t ; " ∈ J the closure in L 1 (, R). Assume first the existence of a family of absolutely continuous probability measures like in the theorem. Let u belong to K τ ,B − L 1+ . Then there exist sequences (u n )n≥0 and (m n )n≥0 such that u n ≤ m n , m n ∈ K τ ,B τ ,B τ ,B and u n → u. Since E P [m n ] ≤ 0, we have E P [u n ] ≤ 0 and since P τ ,B has L1

τ ,B

τ ,B

τ ,B

bounded density, we have E P [u n ] → E P [u]. Then E P [u] ≤ 0 and it is n→∞ not possible to have u ≥ ε on B for some positive real number ε. Conversely, assume now that for all (τ , B) in S f × Fτ , we have K τ ,B − L 1+ ∩ A B = ∅. Since J τ ,B is a convex cone, the set K τ ,B is also a convex cone and we can apply a strict separation theorem in L 1 to the closed convex cone K τ ,B − L 1+ and {1 B } to find g τ ,B in L ∞ and two real numbers α and β with α < β such that g τ ,B | K τ ,B −L 1 ≤ α < β < 1 B , g τ ,B . It is easy to see that g τ ,B ≥ 0, that we can +

take α = 0, that g τ ,B = 0 on B and that g τ ,B | K τ ,B ≤ 0. Letting then P τ ,B be given τ ,B by d P τ ,B /d P ≡ E [11B ggτ ,B ] , we get the result wanted. B

Proof of Theorem 4.6 Assume first that there exist a family of probability measures and an associated family of price processes like in the theorem. Then, according to the proof of Theorem 4.5, and adopting the same notations, we only need to prove that for all (τ , B) ∈ S f × Fτ , for all random variables u τ ,B in K τ ,B , E P [u] ≤ 0. Usingthe specific form of K τ ,B , we are reduced to τ ,B proving that E P θ Z τk2 − Z τk 1 ≤ 0 for all τ 1 , τ 2 ∈ Sτf , k ∈ {1, . . . , n} and θ ∈ L ∞ , Fτ 1 ∧τ 2 , P . For such θ, we have k τ ,B k τ ,B τ ,B τ ,B k θ Z τ 2 − Z τk 1 ≤ E P EP − Sττ1,B | Fτ 1 ∧τ 2 . θEP Sτ 2 By the optional sampling theorem (see e.g. Karatzas and Shreve (1988)), we obtain that k τ ,B τ ,B k τ ,B τ ,B k Sτ 2 Sτ 1 EP | Fτ 1 ∧τ 2 = Sττ1,B∧τ 2 = E P | Fτ 1 ∧τ 2 . For the converse implication, we assume that there is no free lunch, so we know from Theorem 4.5 that for all (τ , B) in S f × Fτ , there exists an absolutely continτ ,B uous probability measure P τ ,B with bounded density such that P (B) = 1 and τ ,B P τ ,B for all " ∈ J , E t≥0 "t ≤ 0. For all k ∈ {1, . . . , n}, for any stopping

2. Arbitrage Pricing with Frictions

63

times τ 1 and τ 2 in Sτf and for all A in Fτ 1 ∧τ 2 , the investment process "(k;1 A ,τ 1 ,τ 2 ) ∈ τ ,B −Z τk 1 + Z τk2 | Fτ 1 ∧τ 2 ≤ 0, thus J τ ,B and we get that E P τ ,B k τ ,B k EP Z τ 2 | Fτ 1 ∧τ 2 ≤ E P Z τ 1 | Fτ 1 ∧τ 2 . (A.1) For all ν ∈ Sτf , we consider the two n-dimensional families Z˜ ν ν∈Sτf and Z˜ ν ν∈Sτf given by τ ,B Z κ | Fν Z˜ ν = ess sup E P f

κ∈Sν

Z˜ ν

= ess inf E P f κ∈Sν

τ ,B

[Z κ | Fν ].

In words, Z˜ νk is the supremum of the conditional expected value of the proceeds from the strategies that consist of going short in the security k (and investing the proceeds in security 0) after the stopping time ν. The random variable Z˜ ν is defined symmetrically. It is a standard result in optimal stopping that for all κ in Sνf τ ,B EP Z˜ κ | Fν ≤ Z˜ ν τ ,B EP Z˜ κ | Fν ≥ Z˜ ν . Now, takingν ≡ s ∨ τ and κ ≡ t ∨ τ for all (s, t) for which s ≤ t, we obtain that τ ,B the process Z˜ t∨τ is a P -supermartingale for (Ft∨τ )t≥0 and that the process t≥0 τ ,B Z˜ t∨τ t≥0 is a P -submartingale for (Ft∨τ )t≥0 . Using inequality (A.1), we have Z˜ t∨τ ≤ Z˜ t∨τ . Now, using Lemma 3 in Jouini and Kallal (1995b) or Proposition 2.6 in andStricker is a process S τ ,B lying between Choulli (1997), we get that τthere ,B Z˜ t∨τ t≥0 and Z˜ t∨τ t≥0 on B, which is a P -martingale for (Ft∨τ )t≥0 . By definition, we have Z ≤ Z˜ and Z˜ ≤ Z after τ and on B, so that after τ and on B, Z ≤ Z˜ ≤ Z˜ ≤ Z . The process S τ ,B is then automatically between Z and Z , after τ and on B, which completes the proof. Proof of Proposition 4.8 We have assumed that there is no arbitrage in the primitive market, so that if " and % in I 0, are such that for all t ∈ ]0, T ], "t = %t , then "0 = %0 . We define on M a linear functional l given by l ("T ) = "0 . Now it is easy to see that for all H in M, lim

λ→+∞

π c (λH ) −π c (−λH ) = lim = l(H ). λ→+∞ λ λ

Since there is no arbitrage, we must have p (H ) ≥ − p (−H ) so that −π c (−H ) ≤ − p (−H ) ≤ p (H ) ≤ π c (H ),

64

E. Jouini and C. Napp

and the price functional p can be written as the sum of a continuous linear functional and a fixed cost, i.e., for all H , p (H ) = l (H ) + c (H ) where c(λH )/λ →λ→∞ 0. Notice that c (H ) ≡ p (H ) − l (H ) ≤ π c (H ) − l (H ) ≤ C. Consequently, in the absence of free lunch, the fair price p (H ) associated with any attainable contingent claim H is given by ∗

p (H ) = E P (H ) + c (H ) where P ∗ is any absolutely continuous martingale measure.

Appendix B Lemma B.1 Any l.s.c. sublinear functional s on a convex cone K ⊆ Xˇ can be written as the supremum over all continuous linear functionals on Xˇ , whose restriction to K lies below s, i.e. for all k ∈ K , s (k) = sup l∈Yˇ l (k). l| K ≤s

Proof We adapt the proof of the Fenchel–Moreau Theorem. Let t (k) ≡ sup l (k) , l ∈ Yˇ , l| K ≤ s . It is immediate that for all k ∈ K , s (k) ≥ t (k). Suppose that there exists k0 ∈ K , such that t (k0 ) < s (k0 ). Let A ≡ {(z, λ) ∈ K × R, s (z) ≤ λ}. Since s is ¯ sublinear, A is a convex cone. Then the closure of A in Xˇ × R, denoted by A, ¯ By the Hahn–Banach is a closed convex cone. Since s is l.s.c., (k0 , t (k0 )) ∈ / A. Separation Theorem, there exists a continuous linear functional ϕ defined on Xˇ × R and α ∈ R such that ¯ ϕ (k0 , t (k0 )) < α ≤ ϕ (z, λ) for all (z, λ) ∈ A.

(B.1)

The set A¯ being a cone, we can take α = 0. Hence there exist a continuous linear functional ϕ 1 on Xˇ and β ∈ R for which ϕ 1 (k0 ) + β [t (k0 )] < 0 ≤ ϕ 1 (z) + βλ for ¯ By taking z ∈ D (s), i.e. z such that s (z) < ∞, and λ = n → ∞ in all (z, λ) ∈ A. the preceding inequality, we see that β ≥ 0. ∗ Consider first the case s ≥ 0. Let ε ∈ R+ . Noting that by definition of A, for all z ∈ D (s), (z, s (z)) ∈ A, we get ϕ 1 (z) + (β + ε) s (z) ≥ 0. This implies that 1 the continuous linear functional − (β+ε) ϕ 1 lies below s on K , and by definition of 1 t, t (k0 ) ≥ − (β+ε) ϕ 1 (k0 ). This leads to ϕ 1 (k0 ) + (β + ε) t (k0 ) ≥ 0 for all ε > 0, which contradicts (B.1). For a general s, consider the functional s¯ ≡ s − f 0 , where f 0 is some continuous linear functional lying below s on K (the condition D (s) = ∅ ensures its existence). The functional s¯ is a nonnegative l.s.c. sublinear functional on K

2. Arbitrage Pricing with Frictions

65

such that D(¯s ) = ∅. The first part of the proof may be applied and we know that ˇ t¯ (k) ≡ sup l (k) , l ∈ Y , l| K ≤ s¯ = s¯ (k). It is clear that t¯ = t − f 0 , hence s = t on K . ˇ Corollary B.2 With the same notations as in Lemma B.1, if K ⊇ X − and s ≤ 0 on Xˇ − , then for all k ∈ K , s (k) = sup l (k) , l ∈ Yˇ+ , l| K ≤ s . Moreover, if there exists f ∈ Yˇ , f > 0, f | K ≤ s, then s (k) = sup l (k) , l ∈ Yˇ , l > 0, l| K ≤ s . Proof Let l ∈ Yˇ , l| K ≤ s. If K ⊇ Xˇ − and s ≤ 0 on Xˇ − , then for all x ∈ Xˇ − , .x, l/ Xˇ ,Yˇ ≤ 0, which means that l ∈ Yˇ+ . Now, suppose that L ≡ f ∈ Yˇ , f > 0, f | K ≤ s = ∅. Let f ∈ L. For all l ∈ Yˇ+ , l| K ≤ s, n1f + 1 − n1 l is a sequence of elements of L, and for all k ∈ K , k, n1 f + 1 − n1 l →n .k, l/. References Adler, I. and Gale, D. (1997), Arbitrage and growth rate for riskless investments in a stationary economy Math. Fin. 2, 73–81. Back, K. and Pliska, S.R. (1990), On the fundamental theorem of asset pricing with an infinite state space J. Math. Econ., 20, 1–18. Bensa¨ıd, B., Lesne, J.-P., Pag`es, H. and Scheinkman, J. (1992), Derivative asset pricing with transaction costs Math. Fin. 2, 63–86. Choulli, T. and Stricker, C. (1997), S´eparation d’une sur- et d’une sousmartingale par une martingale. Th`ese de T. Choulli. Universit´e de Franche-Comt´e. Cvitani´c, J. and Karatzas, I. (1993), Hedging contingent claims with constrained portfolios Ann. App. Prob. 3(3), 652–81. Cvitani´c, J. and Karatzas, I. (1996), Hedging and portfolio optimization under transaction costs: a martingale approach Math. Fin. 6, 133–66. Dalang, R.C., Morton, A. and Willinger, W. (1989), Equivalent martingale measures and no arbitrage in stochastic securities market models Stochastics and Stochastic Rep. 29, 185–202. Debreu, G. (1959), Theory of Value. Wiley, New York. Delbaen, F. (1992), Representing martingale measures when asset prices are continuous and bounded Math. Fin. 2, 107–30. Delbaen, F., Kabanov, Y. and Valkeila, E. (2001), Hedging under transaction costs in currency markets: a discrete-time model. To appear in Math. Fin. Delbaen, F. and Schachermayer, W. (1994), A general version of the fundamental theorem of asset pricing Math. Ann. 300, 463–520. Delbaen, F. and Schachermayer, W. (1998), The fundamental theorem of asset pricing for unbounded stochastic processes. Math. Ann. 312, 215–50. Duffie, D. and Huang, C. (1986), Multiperiod security markets with differential information: martingales and resolution times J. Math. Econ. 15, 283–303. Dybvig, P. and Ross, S. (1987), Arbitrage, in: Eatwell, J., Milgate, M. and Newman, P., eds., The New Palgrave: A Dictionary of Economics, vol. 1. Macmillan, London, 100–6. El Karoui, N. and Quenez, M.-C. (1995), Dynamic programming and pricing of contingent claims in an incomplete market SIAM J. Control and Optimization 33, 29–66.

66

E. Jouini and C. Napp

F¨ollmer, H. and Kramkov, K. (1997), Optional decomposition under constraints Prob. Theory Relat. Fields 109, 1–25. Harrison, M. and Kreps, D. (1979), Martingales and arbitrage in multiperiod security markets J. Econ. Theory 20, 381–408. Harrison, M. and Pliska, S. (1981), Martingales and stochastic integrals in the theory of continuous trading Stochastic Processes Appl. 11, 215–60. Jacod, J. (1979), Calcul Stochastique et Probl`emes de Martingales. Springer, Berlin. Jouini, E. (2000), Price functionals with bid–ask spreads. An axiomatic approach. J. Math. Econ. 34, 547–58. Jouini, E. and Kallal, H. (1995a), Martingales and arbitrage in securities markets with transaction costs J. Econ. Theory 66, 178–97. Jouini, E. and Kallal, H. (1995b), Arbitrage in securities markets with short-sales constraints Math. Fin. 5, 197–232. Jouini, E. and Kallal, H. (1999), Viability and equilibrium in securities markets with frictions Math. Fin. 9(3), 275–92. Jouini, E., Kallal, H. and Napp, C. (2000), Arbitrage and viability in securities markets with fixed transaction costs. To appear in J. Math. Econ. Jouini, E. and Napp, C. (2001), Arbitrage and investment opportunities. To appear in Finance and Stochastics. Jouini, E., Napp, C. and Schachermayer, W. (2000), Arbitrage and state price deflators in a general intertemporal framework. Preprint. Kabanov, Y. (1999), Hedging and liquidation under transaction costs in currency markets Finance and Stochastics 3(2), 237–48. Karatzas, I. and Shreve, S. (1988), Browninan Motion and Stochastic Calculus, (Graduate Texts in Mathematics, Vol. 113), Springer-Verlag, Berlin. Koehl, P.-F. and Pham, H. (2000), Sublinear price functionals under portfolio constraints J. Math. Econ. 33(3), 339–51. Kreps, D. (1981), Arbitrage and equilibrium in economies with infinitely many commodities J. Math. Econ. 8, 15–35. Lakner, P. (1993), Martingale measures for a class of right-continuous processes Math. Fin. 3(1), 43–53. Napp, C. (2000), Pricing issues with investment flows. Applications to market models with frictions. To appear in J. Math. Econ. Schachermayer, W. (1994), Martingale measures for discrete time processes with infinite horizon Math. Fin. 4, 25–55 Stricker, C (1990), Arbitrage et lois de martingale. Ann. Inst. Henri Poincar´e, vol. 26, 451–60. Yan, J.A. (1980), Caract´erisation d’une classe d’ensembles convexes de L 1 ou H 1 . S´em. de Probabilit´es. Lecture notes in Mathematics XIV 784, 220–2

3 American Options: Symmetry Properties J´erˆome Detemple

1 Introduction Put–call symmetry (PCS) holds when the price of a put option can be deduced from the price of a call option by relabeling its arguments. For instance, in the context of the standard financial market model with constant coefficients the value of an American put equals the value of an American call with strike price S, maturity date T , in a financial market with interest rate δ and in which the underlying asset price pays dividends at the rate r . This result was originally demonstrated by McDonald and Schroder (1990, 1998) using a binomial approximation of the lognormal model and by Bjerksund and Stensland (1993) in the continuous time model using PDE methods; it is a version of the international put–call equivalence (Grabbe (1983)). Put–call symmetry is a useful property of options since it reduces the computational burden in implementations of the model. Indeed, a consequence of the property is that the same numerical algorithm can be used to price put and call options and to determine their associated optimal exercise policy. Another benefit is that it reduces the dimensionality of the pricing problem for some payoff functions. Examples include exchange options and quanto options. PCS also provides useful insights about the economic relationship between contracts. Puts and calls, forward prices and discount bonds, exchange options and standard options are simple examples of derivatives that are closely connected by symmetry relations. Some intuition for PCS is based on the properties of the normal distribution. Indeed, in the model with constant coefficients the distribution of the terminal stock price is lognormal. Symmetry of the put and call option payoff function combined with the symmetry of the normal distribution then suggest that the put and call values can be deduced from each other by interchanging the arguments of the pricing functions. This can be verified directly from the valuation formulas for standard European and American options. As demonstrated by Gao, Huang and Subrahmanyam (2000) it is also true for European and American barrier options, 67

68

J. Detemple

such as down and out call and up and out put options, in the model with constant coefficients. Since option values depend only on the volatility of the underlying asset price it seems reasonable to conjecture that PCS will hold in diffusion models in which the drift is an arbitrary function of the asset price but the volatility is a symmetric function of the price. This intuition is exploited by Carr and Chesney (1994) who show that PCS indeed extends to such a setting. Since alternative assumptions about the behavior of the underlying asset price destroy the symmetry of the terminal price distribution it would appear that the property cannot hold in more general contexts. Somewhat surprisingly, Schroder (1999), relying on a change of numeraire introduced by Geman, El Karoui and Rochet (1995), is able to show that the result holds in very general environments including models with stochastic coefficients and discontinuous underlying asset price processes.1 This chapter surveys the latest results in the field and provides further extensions. Our basic market structure is one in which the underlying asset price follows an Itˆo process with progressively measurable coefficients (including the dividend rate) and the interest rate is an adapted stochastic process. We show that a version of PCS holds under these general market conditions. One feature behind the property is the homogeneity of degree one of the put and call payoff functions with respect to the stock price and the exercise price. For such payoffs the standard symmetry property of prices follows from a simple change of measure which amounts to taking the asset price as numeraire. The identification of the change of numeraire as a central feature underlying the standard PCS property permits the extension of the result to more complex contracts which involve liquidation provisions. A random maturity option is an option (put or call) which is automatically liquidated at a prespecified random time and, in such an event, pays a prespecified random cash flow. A typical example is a down and out put option with barrier L. This option expires automatically if the underlying asset price hits the level L (null liquidation payoff), but pays off (K − S)+ if exercised prior to expiration. Put–call symmetry for random maturity options states that the value of an American put with strike price K , maturity date T , automatic liquidation time τ l and liquidation payoff Hτ l equals the value of an American call with strike S, maturity date T , automatic liquidation time τ l∗ and liquidation payoff Hτ∗l in an auxiliary financial market with interest rate δ and in which the underlying asset price pays dividends at the rate r and has initial value K . The liquidation characteristics τ l∗ and Hτ∗l of the equivalent call can be expressed in terms of the put specifications K , τ l and Hτ l and the initial value of the underlying 1 Symmetry results in general market environments are also reported in Kholodnyi and Price (1998). Their

proofs are based on no-arbitrage arguments and use operator theory and group theory notions.

3. American Options: Symmetry Properties

69

asset S. For a down and out put option with barrier L which has characteristics τ L = inf{t ∈ [0, T ] : St = L} and Hτ L = 0 the equivalent up and out call has characteristics

KS ∗ ∗ ∗ ∗ and Hτ∗L = 0, τ L = τ L = inf t ∈ [0, T ] : St = L ≡ L where S ∗ denotes the price of the underlying asset in the auxiliary financial market. Contingent claims which are written on multiple assets also exhibit symmetry properties when their payoff is homogeneous of degree one. In fact the same change of measure argument as in the one asset case identifies classes of contracts which are related by symmetry and therefore can be priced off each other. In particular, for contracts on two underlying assets, we show that American call max-options are symmetric to American options to exchange the maximum of an asset and cash against another asset, that American exchange options are symmetric to standard call or put options (on a single underlying asset) and that American capped exchange options with proportional cap are symmetric to both capped call options with constant caps and capped put options with proportional caps. In all of these relationships the symmetric contract is valued in an auxiliary financial market with suitably adjusted interest rate and underlying asset prices. We then discuss extensions of the property to a class of contracts analyzed recently in the literature, namely occupation time derivatives. These contracts, typically, depend on the amount of time spent by the underlying asset price in certain prespecified regions of the state space. Examples of such path-dependent contracts are Parisian and cumulative barrier options (Chesney, Jeanblanc-Picqu´e and Yor (1997)), step options (Linetsky (1999)) and quantile options (Miura (1992)). More general payoffs based on the occupation time of a constant set, above or below a barrier, are discussed in Hugonnier (1998). While the literature has focused exclusively on European-style contracts in the context of models with geometric Brownian motion price processes, we consider American-style occupation time derivatives in models with Itˆo price processes. We also allow for occupation times of random sets. We show that occupation time derivatives with homogeneous payoff functions satisfy a symmetry property in which the symmetric contract depends on the occupation time of a suitably adjusted random set. Extensions to multiasset occupation time derivatives are also presented. Symmetry-like properties also hold when the contract under consideration is homogeneous of degree ν = 1. In this instance the interest rate in the auxiliary economy depends on the coefficient ν, the interest rate in the original economy and the dividend rate and volatility coefficients of the numeraire asset in the original

70

J. Detemple

economy. The dividend rates of other assets in the new numeraire are also suitably adjusted. Since symmetry properties reflect the passage to a new numeraire asset it is of interest to examine the replicability of attainable payoffs under changes of numeraire. For the case of nondividend paying assets Geman, El Karoui and Rochet (1995) have established that contingent claims that are attainable in one numeraire are also attainable in any other numeraire and that the replicating portfolios are the same. We show that these results extend to the case of dividend-paying assets. This demonstrates that any symmetric contract can indeed be attained in the appropriate auxiliary economy with new numeraire and that its price satisfies the usual representation formula involving the pricing measure and the interest rate that characterize the auxiliary economy. The second section reviews the property in the context of the standard model with constant coefficients. In Section 3 PCS is extended to a financial market model with Brownian filtration and stochastic opportunity set. The markovian model with diffusion price process (and general volatility structure) is examined as a subcase of the general model. Extensions to random maturity options, multiasset contingent claims, occupation time derivatives and payoffs that are homogeneous of degree ν are carried out in Sections 4–7. Questions pertaining to changes of numeraire, replicating portfolios and representation of asset prices are examined in Section 8. Concluding remarks are formulated last.

2 Put–call symmetry in the standard model We consider the standard financial market model with constant coefficients (constant opportunity set). The underlying asset price, S, follows a geometric Brownian motion process

z t ], t ∈ [0, T ]; S0 given d St = St [(r − δ)dt + σ d

(1)

where the coefficients (r, δ, σ ) are constant. Here r represents the interest rate, δ the dividend rate and σ the volatility of the asset price. The asset price process (1) is represented under the equivalent martingale measure Q: the process z is a Q-Brownian motion. In this complete financial market it is well known that the price of any contingent claim can be obtained by a no-arbitrage argument. In particular the value of a European call option with strike price K and maturity date T is given by the Black

3. American Options: Symmetry Properties

71

and Scholes (1973) formula c(St , K , r, δ, t) = St e−δ(T −t) N (d(St , K , r, δ, T − t))

√ −K e−r (T −t) N (d(St , K , r, δ, T − t) − σ T − t)

(2)

where d(S, K , r, δ, T − t) =

log(S/K ) + (r − δ + 12 σ 2 )(T − t) . √ σ T −t

(3)

Similarly the value of a European put with the same characteristics (K , T ) is √ p(St , K , r, δ, t) = K e−r (T −t) N (−d(St , K , r, δ, T − t) + σ T − t) − St e−δ(T −t) N (−d(St , K , r, δ, T − t)).

(4)

Comparison of these two formulas leads to the following symmetry property: Theorem 1 (European PCS) Consider European put and call options with identical characteristics K and T written on an asset with price S given by (1). Let p(S, K , r, δ, t) and c(S, K , r, δ, t) denote the respective price functions. Then p(S, K , r, δ, t) = c(K , S, δ, r, t).

(5)

Proof of Theorem 1 Substituting (K , S, δ, r ) for (S, K , r, δ) in (2) and using log(K /S) + (δ − r + 12 σ 2 )(T − t) √ σ T −t √ log(S/K ) + (r − δ + 12 σ 2 )(T − t) +σ T −t √ = − σ T −t √ (6) = −d(S, K , r, δ, T − t) + σ T − t

d(K , S, δ, r, T − t) =

gives the desired result. This result shows that the put value in the financial market under consideration is the same as the value of a call option with strike price S and maturity date T in an economy with interest rate δ and in which the underlying asset price follows a geometric Brownian motion process with dividend rate r , volatility σ and initial value K , under the risk neutral measure. This symmetry property between the value of puts and calls is even more striking when we consider American options. For these contracts (Kim (1990), Jacka (1991) and Carr, Jarrow and Myneni (1992)) have shown that the value of a call has the early exercise premium representation (EEP)

72

J. Detemple

C(St , K , r, δ, t, B c (·)) = c(St , K , r, δ, t) + π(St , K , r, δ, t, B c (·))

(7)

where C(S, K , r, δ, t, B c (·)) is the value of the American call, c(S, K , r, δ, t) represents the value of the European call in (2) and π (S, K , r, δ, t, B c (·)) is the early exercise premium

T

π (St , K , r, δ, t, B (·)) = c

t

φ(St , K , r, δ, v − t, Bvc )dv

(8)

with φ(St , K , r, δ, v − t, Bvc ) = δSt e−δ(v−t) N (d(St , Bvc , r, δ, v − t))

√ − r K e−r (v−t) N (d(St , Bvc , r, δ, v − t) − σ v − t). (9)

The exercise boundary B c (·) of the call option solves the recursive integral equation Btc − K = C(Btc , K , r, δ, t, B c (·))

(10)

subject to the boundary condition BTc = max(K , rδ K ). Let B c (K , r, δ, t) denote the solution. The EEP representation for the American put can be obtained by following the same approach as for the call. Alternatively the put value can be deduced from the call formula by appealing to the following result (McDonald and Schroder (1998)). Theorem 2 (American PCS) Consider American put and call options with identical characteristics K and T written on an asset with price S given by (1). Let P(S, K , r, δ, t, B p (·)) and C(S, K , r, δ, t, B c (·)) denote the respective price functions and B p (K , r, δ, ·) and B c (S, r, δ, ·) the corresponding immediate exercise boundaries. Then P(S, K , r, δ, t, B p (K , r, δ, ·)) = C(K , S, δ, r, t, B c (S, δ, r, ·))

(11)

and for all t ∈ [0, T ] B p (K , r, δ, t) =

SK B c (S, δ, r, t)

.

(12)

This result can again be demonstrated by substitution along the lines of the proof of Theorem 1. A more elegant approach relies on a change of measure detailed in the next section. Hence, even for American options the value of a put is the same as the value of a call with strike S, maturity date T , in an economy with interest rate δ and in which the underlying asset price, under the risk neutral measure, follows a geometric

3. American Options: Symmetry Properties

73

Brownian motion process with dividend rate r , volatility σ and initial value K . Furthermore the exercise boundary for the American put equals the inverse of the exercise boundary for the American call with characteristics (S, δ, r ) multiplied by the product S K . Some intuition for this result rests on the properties of normal distributions. In models with constant coefficients (r, δ, σ ) the value of put and call options can be expressed in terms of the cumulative normal distribution. Combining the symmetry of the normal distribution with the symmetry of the put and call payoffs leads to the relationship between the option values and the exercise boundaries. A priori this intuition may suggest that the property does not extend beyond the financial market model with constant coefficients. As we show next this conjecture turns out to be incorrect.

3 Put–call symmetry with Itˆo price processes In this section we demonstrate that a version of PCS holds under fairly general financial market conditions. The key to the approach is the adoption of the stock as a new numeraire. Changes of numeraire have been discussed thoroughly in the literature, in particular in Geman, El Karoui and Rochet (1995). The extension of options’ symmetry properties to general uncertainty structures based on this change of numeraire is due to Schroder (1999). This section considers a special case of Schroder, namely a market with Brownian filtration. Suppose we have an economy with finite time period [0, T ], a complete probability space (, F, P) and a filtration F(·) . A Brownian motion process z is defined on (, F) and takes values in R. The filtration is the natural filtration generated by z and FT = F. The financial market has a stochastic opportunity set and nonmarkovian price dynamics. The underlying asset price follows the Itˆo process, d St = St [(rt − δ t )dt + σ t d z t ], t ∈ [0, T ]; S0 given

(13)

under the Q-measure. The interest rate r , the dividend rate δ and the volatility coefficient σ are progressively measurable and bounded processes of the Brownian filtration F(·) generated by the underlying Brownian motion process z. The process z is a Q-Brownian motion. At various stages of the analysis we will also be led to consider an alternative financial market with interest rate δ, in which the underlying asset price S ∗ satisfies d St∗ = St∗ [(δ t − rt )dt + σ t dz t∗ ], t ∈ [0, T ]; S0∗ given

(14)

74

J. Detemple

under some risk neutral measure Q ∗ . In this market the asset has dividend rate r and volatility coefficient σ . The process z ∗ is a Brownian motion under the pricing measure Q ∗ . Both z ∗ and Q ∗ will be specified further as we proceed. We first state a relationship between the values of European puts and calls in the general financial market model under consideration. Theorem 3 (Generalized European PCS) Consider a European put option with characteristics K and T written on an asset with price S given by (13) in the market with stochastic interest rate r . Let p(S, K , r, δ; Ft ) denote the put price process. Then p(St , K , r, δ; Ft ) = c(St∗ , S, δ, r ; Ft )

(15)

where c(St∗ , S, δ, r ; Ft ) is value of a call with strike price S = St and maturity date T in a financial market with interest rate δ and in which the underlying asset price follows the Itˆo process (14) for v ∈ [t, T ] with initial value St∗ = K and with z ∗ defined by z v + σ v dv dz v∗ = −d

(16)

for v ∈ [0, T ], with z 0∗ = 0. This result extends the PCS property of the previous section to nonmarkovian economies with Itˆo price processes and progressively measurable interest rates. The key behind this general equivalence is a change of measure, detailed in the proof, which converts a put option in the original economy into a call option with symmetric characteristics in the auxiliary economy. Note that the equivalence is obtained by switching (S, K , r, δ) to (S ∗ , S, δ, r ), but keeping the trajectories of the Brownian motion the same, i.e. the filtration which is used to compute the value of the call in the auxiliary financial market is the one generated by the original Brownian motion z. Thus information is preserved across economies. In effect the change of measure creates a new asset whose price is the inverse of the original asset price adjusted by a multiplicative factor which depends only on the initial conditions. As we shall see below in the context of diffusion models the change of measure is instrumental in proving the symmetry property without placing restrictions on the volatility coefficient. Proof of Theorem 3 In the original financial market the value pt ≡ p(St , K , r, δ; Ft ) of the put option with characteristics (K , T ) has the (present value) representation T + T T exp − pt = E rv dv K − St exp α v dv + σ v d zv | Ft t

t

t

3. American Options: Symmetry Properties

75

where α ≡ r − δ − 12 σ 2 and the expectation is taken relative to the equivalent martingale measure Q. Simple manipulations show that the right hand side of this equation equals T T 1 2 σ v d zv E exp − δ v + σ v dv + 2 t t + ! T T α v dv − σ v d z v − St | Ft . × K exp − t

t

Consider the new measure T 1 T 2 σ dv + σ v d zv d Q d Q = exp − 2 0 v 0 ∗

(17)

which is equivalent to Q. Girsanov’s Theorem (1960) implies that the process z v + σ v dv dz v∗ = −d

(18)

is a Q ∗ -Brownian motion. Substituting (18) in the put pricing formula and passing to the Q ∗ -measure yields T T 1 ∗ δ v dv K exp (δ v − rv − σ 2v ) dv pt = E exp − 2 t t + ! T σ v dz v∗ − St | Ft . (19) + t

But the right hand side is the value of a call option with strike S = St , maturity date T in an economy with interest rate δ, asset price with dividend rate r and initial value St∗ = K , and pricing measure Q ∗ . An even stronger version of the preceding result is obtained if the coefficients ∗ of the model are adapted to the subfiltration generated by the process z ∗ . Let F(·) ∗ denote the filtration generated by this Q -Brownian motion process. ∗ . Corollary 4 Suppose that the coefficients (r, δ, σ ) are adapted to the filtration F(·) Then

p(St , K , r, δ; Ft ) = c(St∗ , S, δ, r ; Ft∗ ) where c(St∗ , S, δ, r ; Ft∗ ) is value of a call with strike price S = St and maturity ∗ generated by the Q ∗ date T in a financial market with information filtration F(·) Brownian motion process (16), interest rate δ and in which the underlying asset price follows the Itˆo process (14) with initial value St∗ = K .

76

J. Detemple

In the context of this corollary part of the information embedded in the original information filtration generated by the Brownian motion z may be irrelevant for pricing the put option. Since all the coefficients are adapted to the subfiltration generated by z ∗ this is the only information which matters in computing the expectation under Q ∗ in (19). Remark 5 Note that the standard European PCS in the model with constant coefficients is a special case of this corollary. Indeed in this setting direct integration over z ∗ leads to the call value in the auxiliary economy and the put value in the original economy. Let us now consider the case of American options. For these contracts early exercise, prior to the maturity date T , is under the control of the holder. At any time prior to the optimal exercise time the put value Pt ≡ P(St , K , r, δ; Ft ) in the original economy is (see Bensoussan (1984) and Karatzas (1988)) τ τ 1 exp − rv dv K − St exp (rv − δ v − σ 2v ) dv Pt = sup E 2 t t τ ∈St,T + ! τ σ v d zv | Ft + t

where St,T denotes the set of stopping times of the filtration F(·) with values in [t, T ]. Using the same arguments as in the proof of Theorem 3 we can write τ τ 1 ∗ δ v dv K exp (δ v − rv − σ 2v ) dv Pt = sup E exp − 2 τ ∈St,T t t + ! τ σ v dz v∗ − St | Ft + t

where the expectation is relative to the equivalent measure Q ∗ and conditional on the information Ft . Since the change of measure performed does not affect the set of stopping times over which the holder optimizes the following result holds. Theorem 6 (Generalized American PCS) Consider an American put option with characteristics K and T written on an asset with price S given by (13) in the market with stochastic interest rate r . Let P(S, K , r, δ; Ft ) denote the American put price process and τ p (K , r, δ) the optimal exercise time. Then, prior to exercise, the put price is P(St , K , r, δ; Ft ) = C(St∗ , S, δ, r ; Ft )

(20)

where C(St∗ , S, δ, r ; Ft ) is the value of an American call with strike price S = St and maturity date T in a financial market with interest rate δ and in which the

3. American Options: Symmetry Properties

77

underlying asset price follows the Itˆo process (14) with initial value St∗ = K and with z ∗ defined by (16). The optimal exercise time for the put option is τ p (S, K , r, δ) = τ c (K , S, δ, r )

(21)

where τ c (K , S, δ, r ) denotes the optimal exercise time for the call option. Remark 7 Consider the model with constant coefficients (r, δ, σ ). In this setting the optimal exercise time for the call option in the auxiliary financial market is

1 2 c ∗ c τ (K , S, δ, r ) = inf t ∈ [0, T ] : K exp δ − r − σ t + σ z t = B (S, δ, r, t) . 2 On the other hand the optimal exercise time for the put option in the original financial market is

1 2 p p z t = B (K , r, δ, t) τ (S, K , r, δ) = inf t ∈ [0, T ] : S exp r − δ − σ t + σ 2 where B p (K , r, δ, t) is the put exercise boundary. Using the definition of z ∗ in (16) we conclude immediately that B p (K , r, δ, t) =

SK B c (S, δ, r, t)

.

3.1 Diffusion financial market models Suppose that the stock price satisfies the stochastic differential equation d St = St [(r (St , t) − δ(St , t))dt + σ (St , t)d z t ], t ∈ [0, T ]; S0 given

(22)

under the Q-measure. In this market the interest rate r may depend on the stock price and along with the other coefficients of (22) satisfies appropriate Lipschitz and growth conditions for the existence of a unique strong solution (see Karatzas and Shreve (1988)). We assume that the solution is continuous relative to the initial conditions. Since this markovian financial market is a special case of the general model of the previous section PCS holds. However, in the model under consideration the exercise regions of options have a simple structure which leads to a clear comparison between the put and the call exercise policies. Define the discount factor s r (Sv , v)dv Rt,s = exp − t

for t, s ∈ [0, T ] and the Q-martingale

78

J. Detemple

s 1 s Mt,s ≡ exp − σ (Sv , v)2 dv + σ (Sv , v)d zv 2 t t for t, s ∈ [0, T ], s ≥ t. Consider an American call option and let E denote the exercise set. Continuity of the strong solution of (22) relative to the initial conditions implies that the option price is continuous and that the exercise region is a closed set. Thus we can meaningfully define its boundary B c .2 Let E(t) denote the t-section of the exercise region. The EEP representation for a call option with strike K and maturity date T is C(St , K , r, δ, t, B c (·)) = c(St , K , r, δ, t) + π(St , K , r, δ, t, B c (·))

(23)

where C(S, K , r, δ, t, B c (·)) is the value of the American call, c(S, K , r, δ, t) represents the value of the European call c(St , K , r, δ, t) = E

St exp −

T

δ(Sv , v)dv Mt,T − K Rt,T

+

! | St

(24)

t

and π t ≡ π (St , K , r, δ, t, B c (·)) is the early exercise premium s T δ(Sv , v)dv Mt,s δ(Sv , v)St exp − πt = E t t ! − r (Ss , s)K Rt,s 1{Ss ∈E (s)} ds | St .

(25)

In these expressions dependence on r and δ is meant to represent dependence on the functional form of r (·) and δ(·). The boundary B c (·) of the exercise set for the call option solves the recursive integral equation Btc − K = C(Btc , K , r, δ, t, B c (·))

(26)

subject to the boundary condition BTc = max(K , (r (BTc , T )/δ(BTc , T ))K ). Let B c (K , r, δ, t) denote the solution. The optimal exercise policy for the call is to exercise at the stopping time τ (S, K , r, δ) = inf t ∈ [0, T ] : c

−1 S R0,t

t

c exp − δ(Sv , v)dv M0,t = B (K , r, δ, t) . 0

(27)

2 If the exercise region is up-connected the exercise boundary is unique. Failure of this property may imply the

existence of multiple boundaries.

3. American Options: Symmetry Properties

79

In this context put–call symmetry leads to Proposition 8 Consider an American put option with characteristics K and T written on an asset with price S given by (22) in the market with interest rate r (S, t). Let P(S, K , r, δ, t) denote the American put price process and τ p (S, K , r, δ) the optimal exercise time. Then, prior to exercise, the put price is P(St , K , r, δ, t) = C(St∗ , S, δ, r, t)

(28)

where C(St∗ , S, δ, r ; t) is value of an American call with strike price S = St and maturity date T in a financial market with stochastic interest rate δ and in which the underlying asset price S ∗ satisfies the stochastic differential equation ! SK SK SK , v − r , v dv + σ , v dz v∗ , for v ∈ [t, T ] d Sv∗ = Sv∗ δ Sv∗ Sv∗ Sv∗ (29) ∗ ∗ with initial value St = K and with z defined by (16). The optimal exercise time for the put option is τ p (S, K , r, δ) = τ c (K , S, δ, r ) and the exercise boundaries are related by B p (K , r, δ, t) =

SK B c (S, δ, r, t)

.

(30)

In the financial market setting of (22) all the information relevant for future payoffs is embedded in the current stock price. Any strictly monotone transformation of the price is also a sufficient statistic. Thus the passage from the original economy to the auxiliary economy with stock price (29) preserves the information required to price derivatives with future payoffs. No information beyond the current price St∗ is required to assess the correct evolution of the coefficients of the underlying asset price process. This stands in contrast with the general model with Itˆo price processes in which the path of the Brownian motion needs to be recorded in the auxiliary economy for proper evaluation of future distributions. Note also that the change of measure converts the original underlying asset into a symmetric asset with inverse price up to a multiplicative factor depending only on the initial conditions. Since the change of measure can be performed independently of the structure of the coefficients the results are valid even in the absence of symmetry-like restrictions on the volatility coefficient. Proof of Proposition 8 The first part of the proposition follows from Theorem 6. To prove the relationship between the exercise boundaries note that the call boundary at maturity equals B c = max(K , bc )

80

J. Detemple

where bc solves the nonlinear equation SK SK c ,T b −δ , T S = 0. r bc bc In this expression we used the relation ST = S K /ST∗ . Now with the change of variables b p = S K /bc it is clear that b p solves r (b p , T )K − δ(b p , T )b p = 0 and that the put boundary at the maturity date satisfies (30). To establish the relation prior to the maturity date it suffices to use the recursive integral equation for the call boundary, pass to the Q ∗ -measure and perform the change of variables indicated. The resulting expression is the recursive integral equation for the put boundary. The results in this section can be easily extended to multivariate diffusion models (S, Y ) where Y is a vector of state variables impacting the coefficients of the underlying asset price process. Passage to the measure Q ∗ , in this case, introduces a risk premium correction in the state variables processes. Multivariate models in that class are discussed extensively in Schroder (1999).

4 Options with random expiration dates We now consider a class of American derivatives which mature automatically if certain prespecified conditions are satisfied. Let τ l denote a stopping time of the filtration and let H = {Ht : t ∈ [0, T ]} denote a progressively measurable process. A call option with maturity date T , strike K , automatic liquidation time τ l and liquidation payoff H pays (S − K )+ if exercised by the holder at date t < τ l . If τ l materializes prior to T the option automatically matures and pays off Hτ l . A random maturity put option with characteristics (K , T, τ l , H ) has similar provisions but pays (K − S)+ if exercised prior to the automatic liquidation time τ l . Options with such characteristics are referred to as random maturity options. Popular examples of such contracts are barrier options such as down and out put options and up and out call options. Both of these contracts become worthless when the underlying asset price reaches a prespecified level L (i.e. the liquidation payoff is a constant H = 0). Another example is an American capped call option with automatic exercise at the cap L. This option is automatically liquidated at the random time τ l = τ L ≡ inf{t ∈ [0, T ] : St = L} or τ L = ∞ if no such time materializes in [0, T ] and pays off the constant H =

3. American Options: Symmetry Properties

81

L − K in that event. If τ L > T the option payoff is (S − K )+ .3 Capped options with growing caps and automatic exercise at the cap are examples in which the automatic liquidation payoff is time dependent Consider again the general financial market model with underlying asset price given by (13). Recall the definitions of the discount factor s rv dv Rt,s = exp − t

for t, s ∈ [0, T ] and the Q-martingale s 1 s 2 σ dv + σ v d zv Mt,s ≡ exp − 2 t v t for t, s ∈ [0, T ], s ≥ t. Let Pt = P(S, K , T, τ l , H, r, δ; Ft ) denote the value of an American random maturity put with characteristics (K , T, τ l , H ). In this financial market the put value is given by + τ −1 Pt = sup E Rt,τ K − St Rt,τ exp − δ v dv Mt,τ 1{τ 0 where A± (L) is defined above. Again the PCS relation (39) holds in this case. Put and call step options are special cases of the occupation time derivatives in which the payoff function involves exponential discounting. Closed form solutions are provided by Linetsky for geometric Brownian motion price process. Occupation time derivatives can be easily generalized to the multiasset case. For a progressively measurable stochastic closed set A ∈ Rn+ and a vector of asset prices S ∈ B(Rn+ ) a multiasset f -claim has payoff f (S, K , O S,A ) where t S,A 1{Sv ∈Av } dv, t ∈ [0, T ]. Ot = 0

A natural generalization of Theorem 13 is Theorem 15 Consider an American occupation time f -claim with maturity date T and a payoff function f (S, K , O S,A ) which is homogeneous of degree one in (S, K ). Let V (S, K , O S,A , r, δ; Ft ) denote the value of the claim in the financial market with filtration F(·) , asset prices S satisfying (37) and progressively measurable interest rate r . Pick some arbitrary index j and define K r and λ j (δ) ≡ j . j S δ Prior to exercise the value of the multiasset occupation time f -claim is λj ≡

V (St , K , O S,A , r, δ; Ft ) = V j (St∗ , S j , O S

∗ ,A∗

, δ j , λ j (δ) ◦ j δ; Ft )

where A∗ = {A∗ (v, ω), v ∈ [t, T ]} with A∗ (v, ω) = {x ∈ Rn+ : xi = yi S/y j , for ∗ ∗ i = j, x j = K S/y j and y = (y1 , . . . , yn ) ∈ A(v, ω)} and OtS ,A ≡ OtS,A . Also ∗ ∗ V j (St∗ , S j , OtS ,A , δ j , λ j (δ) ◦ j δ; Ft ) is the value of the f j -claim with parameter ∗ ∗ j S j = St , maturity date T and occupation time OtS ,A in an auxiliary financial market with interest rate δ j and in which the underlying asset prices follow the Itˆo processes " d Svi∗ = Svi∗ [(δ vj − δ iv )dv + (σ vj − σ iv )dz vj∗ ]; for i = j and v ≥ t d Svj∗ = Svj∗ [(δ vj − rv )dv + σ vj dz vj∗ ]; for i = j and v ≥ t with respective initial conditions Si for j = i and K for j = i. The process z j∗ is defined by

z v + σ vj dv dz vj∗ = −d

3. American Options: Symmetry Properties

93

j∗

for all v ∈ [0, T ], z 0 = 0. The optimal exercise time for the f -claim is the same as the optimal exercise time for the f j -claim in the auxiliary financial market. Some particular cases are the natural counterpart of standard multiasset options. 1. Cumulative barrier max- and min-options: When there are two underlying assets call options in this category have payoff functions of the form (St1 ∨ St2 − K )+ 1{O S,A ≥b} (max-option) or (St1 ∧ St2 − K )+ 1{O S,A ≥b} (min-option), where t t b ∈ [0, T ]. Similarly for put options. It is easily verified that a cumulative barrier call max-option is symmetric to a cumulative barrier option to exchange the maximum of an asset and cash against another asset for which the occupation time has been adjusted. 2. Cumulative barrier exchange options: The payoff function takes the form (S 1 − S 2 )1{O S,A ≥b} . This exchange option is symmetric to cumulative barrier call and t put options with suitably adjusted occupation times. 3. Quantile options (Miura (1992), Akahori (1995), Dassios (1995)): An αquantile call option pays off (M(α, t) − K ) upon exercise where M(α, t) = − t inf{x : 0 1{Sv ≤x} dv > αt} = inf{x : OtS,A (x) > αt}. Consider an α-quantile strike put with payoff (M(α, t) − St ). Note that t

t M(α, t) = inf x : 1{Sv ≤x} dv > αt = inf{x : 1{SSv /St ≤Sx/St } dv > αt} 0 0 t 1{SSv /St ≤y} dv > αt} ≡ (St /S)M ∗ (α, t) = (St /S) inf{y : 0

∗

∗ where M (α, t) is the α-quantile of the normalized price Sv,t ≡ SSv /St for ∗ v ≤ t. Thus M(α, t) = (St /S)M (α, t) and an α-quantile strike put is seen to be symmetric to an α-quantile call option with (fixed) strike price S and quantile ∗ based on the normalized asset price Sv,t , v ≤ t.

Multiasset step options can be also be defined in a natural manner and satisfy symmetry properties akin to those of standard multiasset options.

7 Symmetry property without homogeneity of degree one Several derivative securities have payoffs that are not homogeneous of degree one. Examples include digital options and quantile options (homogeneous of degree ν = 0) or product options (homogeneous of degree ν = 0, 1). Product options (options on a product of assets) include options on foreign indices with payoff in domestic currency such as quanto options. As we show below, even in these cases, symmetry-like properties link various types of contracts.

94

J. Detemple

Consider an f -claim on n underlying assets whose payoff is homogeneous of degree ν, i.e., f (λS, λK ) = λν f (S, K ) for some ν ≥ 0 and for all λ > 0. The following result is then valid. Theorem 16 Consider an American f -claim with maturity date T and a continuous and homogeneous of degree ν payoff function f (S, K ). Let V (S, K , r, δ; Ft ) denote the value of the claim in the financial market with filtration F(·) , asset prices St satisfying (37) and progressively measurable interest rate r . For j = 1, . . . , n, define 1 r j∗ = (1 − ν)r + νδ j + ν(1 − ν)σ j σ j 2 1 δ i∗ = (1 − ν)r + δ i + (ν − 1)δ j + (1 − ν) −1 + ν σ j σ j + (1 − ν)σ i σ j , 2 for i = j 1 δ j∗ = (2 − ν)r + (ν − 1)δ j + (1 − ν) −1 + ν σ j σ j . 2 Prior to exercise the value of the claim is, for any j = 1, . . . , n, V (St , K , r, δ; Ft ) = V j (St∗ , S j , r j∗ , δ ∗ ; Ft ) where V j (St∗ , S j , r j∗ , δ ∗ ; Ft ) is the value of the f j -claim with parameter S j and maturity date T in an auxiliary financial market with interest rate r j∗ and in which the underlying asset prices follow the Itˆo processes " j i j∗ d Svi∗ = Svi∗ [(rvj∗ − δ i∗ v )dv + (σ v − σ v )dz v ]; for i = j and v ∈ [t, T ] d Svj∗ = Svj∗ [(rvj∗ − δ vj∗ )dv + σ vj dz vj∗ ]; for i = j and v ∈ [t, T ] ∗j

with respective initial conditions St∗i = S i for i = j and St = K for i = j. The process z j∗ is defined by j∗

dz vj∗ = −d z v + νσ vj dv, for v ∈ [0, T ]; z 0 = 0. The optimal exercise time for the f -claim is the same as the optimal exercise time for the f j -claim in the auxiliary financial market. j

Proof of Theorem 16 Define S j = St . Let 1 rvj∗ = (1 − ν)rv + νδ vj + ν(1 − ν)σ vj σ vj 2

3. American Options: Symmetry Properties

95

and note that τ j ν Sτ exp − rv dv Sj t T T 1 2 T j j j∗ j = exp − rv dv exp − ν σ v σ v dv + ν σ v d zv . 2 t t t Defining the equivalent measure T 1 2 T j j j∗ j σ v σ v dv + ν σ v d zv d Q d Q = exp − ν 2 0 0 enables us to write V (St , K , r, δ; Ft ) = = = =

exp − sup E

τ ∈St,T

sup E exp − sup E j∗

τ ∈St,T

sup E j∗

τ ∈St,T

= V

t

ν ! Sτj Sj Sj rv dv f Sτ j , K j |Ft Sj Sτ Sτ t τ ! j S j∗ j∗ rv dv f Sτ j , Sτ |Ft exp − Sτ t τ ! j∗ j ∗ j |F rv dv f (Sτ , S ) t exp −

τ ∈St,T

j

! rv dv f (Sτ , K ) |Ft

τ

τ

t

(St∗ ,

∗j

∗

S , r , δ ; Ft ). j

Under Q j∗ the process z v + νσ vj dv dz vj∗ = −d is a Brownian motion and S i∗ satisfies, for i = j and v ∈ [t, T ] zv ] d Svi∗ = Svi∗ [(δ vj − δ iv + (σ vj − σ iv )σ vj )dv − (σ vj − σ iv )d = Svi∗ [(δ vj − δ iv + (σ vj − σ iv )σ vj )dv + (σ vj − σ iv )[dz vj∗ − νσ vj dv]] = Svi∗ [(δ vj − δ iv + (1 − ν)(σ vj − σ iv )σ vj )dv + (σ vj − σ iv )dz vj∗ ] j i j∗ = Svi∗ [(rvj∗ − δ i∗ v )dv + (σ v − σ v )dz v ]

where δ i∗ v

= (1 − ν)rv +

δ iv

+ (ν −

1)δ vj

1 + (1 − ν) −1 + ν σ vj σ vj + (1 − ν)σ iv σ vj 2

and for i = j and v ∈ [t, T ] zv ] d Svj∗ = Svj∗ [(δ vj − rv + σ vj σ vj )dv − σ vj d = Svj∗ [(δ vj − rv + (1 − ν)σ vj σ vj )dv + σ vj dz vj∗ ] = Svj∗ [(rvj∗ − δ vj∗ )dv + σ vj dz vj∗ ]

96

J. Detemple

where δ vj∗

1 = (2 − ν)rv + (ν − 1)δ vj + (1 − ν) −1 + ν σ vj σ vj . 2

This completes the proof of the theorem. Remark 17 When the claim is homogeneous of degree 1 the interest rate and the i dividend rates in the economy with numeraire j become r vj∗ = δ vj , δ i∗ v = δ v , for j∗ i = j, and δ v = rv . Thus we recover the prior results of Theorem 13. Another special case of interest is when the payoff function is homogeneous of degree 0. The economy with numeraire j then has characteristics r j∗ = r δ i∗ = r + δ i − δ j − (σ j − σ i )σ j , for i = j δ j∗ = 2r − δ j − σ j σ j and the underlying asset prices follow the Itˆo processes " j i j∗ d Svi∗ = Svi∗ [(rvj∗ − δ i∗ v )dv + (σ v − σ v )dz v ]; for i = j and v ∈ [t, T ] d Svj∗ = Svj∗ [(rvj∗ − δ vj∗ )dv + σ vj dz vj∗ ]; for i = j and v ∈ [t, T ] ∗j

with respective initial conditions St∗i = S i for i = j and St = K for i = j. The process z j∗ is defined by dz vj∗ = −d z v , for v ∈ [0, T ]. It is a Brownian motion ∗ under Q = Q. Examples of contracts in this category are 1. Digital options: A digital call option ( f (S, K ) = 1{S≥K } ) is symmetric to a digital put option with strike S = St , written on an asset with dividend rate δ ∗ = 2r − δ − σ 2 , in an economy with interest rate r ∗ = r . 2. Digital multiasset options: A digital call max-option ( f (S 1 , S 2 , K ) = 1{S 1 ∨S 2 ≥K } ) is symmetric to a digital option to exchange the maximum of an asset and cash against another asset ( f 2 (S 1 , S 2 , K ) = 1{S∗1 ∨K ≥S∗2 } , where K = S 2 ) in the economy with asset j = 2 as numeraire (with characteristics r 2∗ = r, δ 1∗ = r + δ 1 − δ 2 − (σ 2 − σ 1 )σ 2 , and δ 2∗ = 2r − δ 2 − σ 2 σ 2 ). A digital call min-option ( f (S 1 , S 2 , K ) = 1{S1 ∧S 2 ≥K } ) is symmetric to a digital option to exchange the minimum of an asset and cash against another asset ( f 2 (S 1 , S 2 , K ) = 1{S∗1 ∧K ≥S∗2 } , where K = S 2 ) in the same auxiliary economy. Similar relations hold for digital multiasset put options. 3. Cumulative barrier digital options: Symmetry properties for occupation time derivatives with homogeneous of degree zero payoffs can be easily identified by drawing on the previous section. A cumulative barrier digital call op+ tion with barrier L (i.e. payoff f (S, K , O S,A (L) ) = 1{S≥K } 1{O S,A+ (L) ≥b} where t

3. American Options: Symmetry Properties

97

A+ (L) = {x ∈ R+ : (x − L)+ ≥ 0}) is symmetric to a cumulative barrier digital ∗ − ∗ put option with barrier L ∗ = K S/L (i.e. payoff f 1 (S ∗ , K , O S ,A (L ) ) = 1{K ≥S ∗ } 1{O S∗ ,A− (L ∗ ) ≥b} where K = S and A− (L ∗ ) = {x ∈ R+ : (x −L ∗ )− ≥ 0}). t A similar symmetry relation can be established for Parisian digital call and put options. 4. Quanto options: Consider again the quanto call option with payoff e(S − K )+ in foreign currency where e is the Y/$ exchange rate. From the foreign perspective the contract is homogeneous of degree ν = 2 in the triplet (e, S, K ). The results of Theorem 16 imply that the quanto call is symmetric to an exchange option in an economy with interest rate r f ∗ = −r f + 2r − σ e σ e and which underlying assets have dividend rates δ 1∗ = −r f + δ + r − σ σ e δ 2∗ = r. The call value can be written

f ∗ exp − C tQ = et sup E τ ∈St,T

where

"

τ t

! rvf ∗ dv (Sτ1∗ − Sτ2∗ )+ |Ft

e f∗ d Sv1∗ = Sv1∗ [(rvf ∗ − δ 1∗ v )dv + (σ v − σ v )dz v ]; for v ∈ [t, T ] e f∗ d Sv2∗ = Sv2∗ [(rvf ∗ − δ 2∗ v )dv + σ v dz v ]; for v ∈ [t, T ],

with the initial conditions St1∗ = St and St2∗ = K . An alternative representation for the quanto call was provided in Section 7. Remark 18 Representation formulas involving the change of measure introduced in earlier sections can also be obtained with payoffs that are homogeneous of degree ν. In this case the coefficients of the underlying asset price processes reflect j the homogeneity degree of the payoff function. Indeed letting S j = St we can always write τ ! |F rv dv f (Sτ , K ) t V (St , K , r, δ; Ft ) = sup E exp − τ ∈St,T

=

sup E exp −

τ ∈St,T

t

t

τ

Sτj rv dv Sj j 1/ν S

j 1/ν S ,K × f Sτ j j Sτ Sτ

! |Ft

98

J. Detemple

=

sup E

τ ∈St,T

j∗

exp −

τ t

δ vj dv

f ( Sτ , Sτn+1 ) |Ft

!

Svn+1 = K ( S j )1/ν for v ∈ [t, T ]. The where Svi = Svi ( S j )1/ν for i = 1, . . . , n and j

j

Sv

Sv

auxiliary economy has interest rate δ j and the equivalent measure Q j∗ is T 1 T j j d Q j∗ = exp − σ v σ v dv + σ vj d z v d Q. 2 0 0 z v + σ vj dv, for v ∈ [0, T ] is a Q j∗ -Brownian motion The process dz vj∗ = −d process.

8 Changes of numeraire and representation of prices In the financial markets of the previous sections the price of a contingent claim is the expectation of its discounted payoff where discounting is at the riskfree rate and the expectation is taken under the risk neutral measure. This standard representation formula is implied by the ability to replicate the claim’s payoff using a suitably constructed portfolio of the basic securities in the model. Since symmetry properties are obtained by passing to a new numeraire a natural question is whether contingent claims that are attainable in the basic financial markets are also attainable in the economy with new numeraire. This question is in fact essential for interpretation purposes since the symmetry properties above implicitly assume that the renormalized claims can be priced in the new numeraire economy and that their price corresponds to the one in the original economy. For the case of nondividend paying assets Geman, El Karoui and Rochet (1995) prove that contingent claims that are attainable in one numeraire are also attainable in any other numeraire and that the replicating portfolios are the same. Our next theorem provides an extension of this result to dividend-paying assets. The framework of section 2 with Brownian filtration is adopted for convenience only; the results are valid for more general filtrations. Theorem 19 Consider an economy with Brownian filtration and complete financial market with n risky assets and one riskless asset. Suppose that risky assets pay dividends and that their prices follow Itˆo processes (37), and that the riskless asset pays interest at the rate r . Assume that all the coefficients are progressively measurable and bounded processes. If a contingent claim’s payoff is attainable in a given numeraire then it is also attainable in any other numeraire. The replicating portfolio is the same in all numeraires. Proof of Theorem 19 Let i = 0 denote the riskless asset. The gains from trade in

3. American Options: Symmetry Properties

99

the primary assets are dG it

≡ d Sti + Sti δit dt = Sti [rt dt + σ it d z t ], for i = 1, . . . , n

dG 0t ≡ d Bt = Bt rt dt, for i = 0. For i = 0, . . . , n, gains from trade expressed in numeraire j are t 1 i i Sti i, j Gt = j + δ S dv j v v St 0 Sv so that i, j dG t

1

Sti d

1

1 = + +d S , j + j S S ! t 1 1 1 + d Si , j . = j dG it + Sti d j S t St St d Sti j St

1

S i δ i dt j t t St

(40) !

i

t

Now let π i represent the amount invested in asset i and consider a portfolio T (π 0 , π) ∈ Rn+1 such that 0 π v σ v σ v π v dv < ∞, (P-a.s.). The wealth process X generated by N , where N j = π j /S j , j = 0, . . . , n represents the number of shares of each asset in the portfolio, satisfies d Xt =

n

Nti dG it

i=0

n

and X t = (this portfolio is self financing since all dividends are reinvested). Using Itˆo’s lemma gives i ! n n 1 Xt i dG t i i 1 Nt Nt d G , j d = + Xt d + j j j S t St St St i=0 i=0 ! n i 1 1 dG t = Nti + Sti d + d Si , j j j S t St St i=0 n i, j = Nti dG t i i i=0 Nt St

i=0

i.e. the normalized wealth process can be synthesized in the new numeraire economy in which all asset prices have been deflated by the numeraire asset j. Furthermore the investment policy which achieves normalized wealth is the same as in the original economy. Consequently, any deflated payoff is attainable in the new numeraire economy when the (undeflated) payoff is attainable in the original economy. Remark 20 (i) The proper definition of gains from trade in the new numeraire is instrumental in the proof above. Since dividends are paid over time they must be

100

J. Detemple

deflated at a discount rate which reflects the timing of the cash flows. This explains the discount factor inside the integral of dividends in (40). (ii) Note that Theorem 19 applies even if the numeraire chosen is a portfolio of assets or any other progressively measurable process instead of one of the primitive assets. It also applies when the portfolio is not self financing, for example when there are infusions or withdrawal of funds over time. (iii) The results above apply for payoffs that are received at fixed time as well as stopping times of the filtration: if there exists a trading strategy that attains the random payoff X τ where τ ∈ S0,T in the original financial market then the normalized payoff X τ /Sτj is attainable in the economy with numeraire asset j. Our next result now follows easily from the above. Theorem 21 Suppose that asset j serves as numeraire and that S j satisfies (37). Define the probability measure Q j∗ by

T j exp(− 0 (rv − δ v )dv)ST j∗ dQ = dQ j S0 T 1 T j j j σ σ dv + σ d z (41) = exp − v dQ v 2 0 v v 0 and consider the discount rate δ j . Then the discounted prices of primary securities expressed in numeraire j are Q j∗ -supermartingales (discounted gains from trade in numeraire j are Q j∗ -martingales) and the price of any attainable security in the original economy can be represented as the expected discounted value of its cash flows expressed in numeraire j where the discount rate is δ j and the expectation is under the Q j∗ -measure. Proof of Theorem 21 Using definition (40) of gains from trade expressed in numeraire j and Itˆo’s lemma gives ! 1 1 1 i i i, j i i i 1 dG t = d S + S d S δ dt + d S , + t t j j j t t Sj t St St St 1 i 1 j j j j = S [r dt + σ it d z t ] + Sti j [(δ t − rt + σ t σ t )dt − σ t d zt ] j t t St St 1 j −Sti j σ it σ t dt St 1 i j j j j S [(δ t + (σ t − σ it )σ t )dt + (σ it − σ t )d zt ] = j t St 1 i j j j∗ S [δ dt + (σ t − σ it )dz t ], = j t t St

3. American Options: Symmetry Properties j∗

101

j

where dz t = −d z t + σ t dt is a Q j∗ -Brownian motion process. Defining Sti∗ = j Sti /St we can then write j

j

j∗

d Sti∗ = Sti∗ [(δ t − δ it )dt + (σ t − σ it )dz t ]

t i.e. the discounted price of asset i in numeraire j, exp(− 0 δ vj dv)Sti∗ , is a Q j∗ supermartingale where discounting is at the rate δ j . Alternatively the discounted gains from trade process t v t j i∗ j exp − δ v dv St + exp − δ u du Svi∗ δ iv dv 0

0

0

j∗

is a Q -martingale. Thus, we can write the representation formula v T ! T j∗ i∗ j i∗ j i∗ i St = E t exp − δ v dv ST + exp − δ u du Sv δ v dv |Ft . t

t

t

The relations satisfied by primary asset prices also apply to portfolios of primary assets and therefore to any contingent claim that is attainable. This completes the proof of the theorem. Remark 22 When a dividend-paying primary asset price is chosen as deflator the auxiliary economy has an interest rate equal to the dividend rate of the deflator. In this new numeraire cash is converted into an asset that pays a dividend rate equal to the interest rate in the original economy. If we choose the discounted price t j j St = exp(− 0 (rv − δ vj )dv)St , which is a martingale, as numeraire the process j St satisfies Sti∗ = Sti / j

j∗

d Sti∗ = Sti∗ [(rt − δ iv )dt + (σ t − σ it )dz t ] and its discounted value at the riskfree rate is a Q j∗ -supermartingale where Q j∗ is defined in (41). With this choice of numeraire the interest rate remains unchanged in the auxiliary economy. Cash is converted into an asset that pays a dividend rate equal to the interest rate and thus has null drift (martingale). Remark 23 (i) Note that a payoff expressed in a new numeraire is not necessarily the same as the payoff evaluated at normalized underlying asset prices (i.e. prices expressed in the new numeraire). There is clearly equivalence when the payoff is homogeneous of degree one. With homogeneity of degree ν the payoff in the new numeraire is equivalent to the payoff function evaluated at underlying asset prices that are normalized by a power of the numeraire price. Normalized asset prices (in the payoff function) then differ from asset prices expressed in the new numeraire. (ii) A byproduct of Theorem 21 is a generalized “symmetry” property which applies to any payoff function. In this interpretation of the property the symmetric contract is simply the payoff expressed in the new numeraire.

102

J. Detemple

Some extensions are worth mentioning. Remark 24 Note that the results on the replication of attainable contingent claims, their financing portfolios and their representation under new measures are valid even when markets are incomplete. Indeed if the claims under consideration can be replicated in a given incomplete market equilibrium (i.e. if the claims’ payoffs live in the asset span) so can they under a change of numeraire. The results are also valid when the market is effectively complete (single agent economies). In this case even when claims payoffs cannot be duplicated they have a unique price which can be expressed in different forms corresponding to various choices of numeraire.

9 Conclusion In this paper we have reviewed and extended recent results on PCS. Features of the models considered include (i) financial markets with progressively measurable coefficients, (ii) random maturity options, (iii) options on multiple underlying asset, (iv) occupation time derivatives and (v) payoff functions that are homogeneous of degree ν = 1. One important element in the proofs is the ability to renormalize a vector of prices and parameters which determine the payoff of the contract. Homogeneity of degree ν is sufficient in that regard but it is not a necessary condition. Another important element in the proofs is the separation between the role of informational variables and the change of measure (numeraire). Indeed while the change of measure converts the underlying assets into normalized or symmetric assets in the auxiliary financial market the information sets in the two markets are kept the same. This separation enables us to derive symmetry properties even for financial markets in which prices do not follow Markov processes. In the context of diffusion models the change of measure is instrumental for obtaining symmetry properties of option prices without restricting volatility coefficients. Some of the results in the paper can be readily extended. Symmetry-like properties hold for multiasset contracts even when the payoff functions are not homogeneous of some degree ν (for instance when homogeneity of different degrees holds relative to different subsets of the underlying asset prices). In this instance normalized prices in the auxiliary economy involve further adjustments to dividends and volatilities. Likewise the methodology reviewed in this paper also applies, in principle, to complete financial markets with general semimartingales or even to incomplete markets provided that the securities under consideration lie in the asset span.

3. American Options: Symmetry Properties

103

References Akahori, J. (1995), Some formulae for a new type of path-dependent option Annals of Applied Probability 5, 383–8. Bensoussan, A. (1984), On the theory of option pricing Acta Applicandae Mathematicae 2, 139–58. Bjerksund, P. and Stensland, G. (1993), American exchange options and a put–call transformation: a note Journal of Business, Finance and Accounting 20, 761–4. Black, F. and Scholes, M. (1973), The pricing of options and corporate liabilities Journal of Political Economy 81, 637–54. Broadie, M. and Detemple, J.B. (1995), American capped call options on dividend-paying assets Review of Financial Studies 8, 161–91. Broadie, M. and Detemple, J.B. (1997), The valuation of American options on multiple assets Mathematical Finance 7, 241–85. Carr, P. and Chesney, M. (1996), American put call symmetry. Working paper. Carr, P., Jarrow, R. and Myneni, R. (1992), Alternative characterizations of American put options Mathematical Finance 2, 87–106. Chesney, M. and Gibson, R. (1993), State space symmetry and two factor option pricing models, in J. Janssen and C. H. Skiadas, eds, Applied Stochastic Models and Data Analysis. World Scientific Publishing Co, Singapore. Chesney, M., Jeanblanc-Picqu´e, M. and Yor, M. (1997), Brownian excursions and Parisian barrier options Advances in Applied Probability 29, 165–84. Dassios, A. (1995), The distribution of the quantile of a Brownian motion with drift and the pricing of related path-dependent options Annals of Applied Probability 5, 389–98. Detemple, J. B., Feng, S. and Tian W., (2000), The valuation of American options on the minimum of dividend-paying assets. Working paper, Boston University. Gao, B., Huang, J.Z. and Subrahmanyam, M. (2000), The valuation of American barrier options using the decomposition technique Journal of Economic Dynamics and Control, to appear. Garman, M., (1989), Recollection in Tranquility Risk 24, 1783–827. Geman, E., El Karoui, N. and Rochet, J.C. (1995), Changes of numeraire, changes of probability measure and option pricing Journal of Applied Probability 32, 443–58. Girsanov, I.V., (1960), On transforming a certain class of stochastic processes by absolutely continuous substitution of measures Theory of Probability and Its Applications 5, 285–301. Goldman, B., Sosin, H. and Gatto, M. (1979), Path-dependent options: buy at the low, sell at the high Journal of Finance 34, 1111–27. Grabbe, O., (1983), The pricing of call and put options on foreign exchange Journal of International Money and Finance 2, 239–53. Hugonnier, J. (1998), The Feynman–Kac formula and pricing occupation time derivatives. Working paper, ESSEC. Jacka, S. D. (1991), Optimal stopping and the American put Mathematical Finance 1, 1–14. Karatzas, I. (1988), On the pricing of American options Appl. Math. Optim. 17, 37–60. Karatzas, I. and Shreve, S. Brownian Motion and Stochastic Calculus. Springer-Verlag, New York, 1988. Kholodnyi, V.A. and Price, J.F. Foreign Exchange Option Symmetry. World Scientific Publishing Co., New Jersey, 1998. Kim, I.J. (1990), The analytic valuation of American options Review of Financial Studies 3, 547–72.

104

J. Detemple

Linetsky, V. (1999), Step options Mathematical Finance 9, 55–96. Margrabe, W. (1978), The value of an option to exchange one asset for another Journal of Finance 33, 177–86. McDonald, R. and Schroder, M. (1990), A parity result for American options Journal of Computational Finance. Working paper, Northwestern University. McKean, H.P. (1965), A free boundary problem for the heat equation arising from a problem in mathematical economics Industrial Management Review 6, 32–9. Merton, R.C. (1973), Theory of rational option pricing Bell Journal of Economics and Management Science 4, 141–83. Miura, R. (1992), A note on look-back option based on order statistics Hitosubashi Journal of Commerce and Management 27, 15–28. Rubinstein, M. (1991), One for another Risk. Schroder, M. (1999), Changes of numeraire for pricing futures, forwards and options Review of Financial Studies 12, 1143–63.

4 Purely Discontinuous Asset Price Processes Dilip B. Madan

1 Introduction Prices of assets determined in highly liquid financial markets are generally viewed as continuous functions of time. This is true of the Black–Scholes (1973), and Merton (1973) model of geometric Brownian motion for the dynamics of the price of a stock, and of its many successors that include the stochastic volatility models of Hull and White (1987), Heston (1993) and the more recent advances into modeling the evolution of the local volatility surface by Derman and Kani (1994), and Dupire (1994). Jumps or discontinuities, when considered, have been added on as an additional orthogonal compound Poisson process also impacting the stock, as for example in Press (1967), Merton (1976), Cox and Ross (1976), Naik and Lee (1990), Bates (1996), and Bakshi and Chen (1997). This class of models is broadly referred to as jump-diffusion models and as the name suggests they are mixture models studying the high activity and low activity events by using two orthogonal modeling strategies. The purpose of this chapter is to present the case for an alternative approach that stands in sharp contrast to the above mentioned models and synthesizes the study of high and low activity price movements using a class of purely discontinuous price processes. The contrast with the above class of models is that the processes advocated here have no continuous component, as all jump-diffusions must have, and furthermore, the discontinuities are infinite in number with moves of larger sizes coming at a slower rate than moves of smaller sizes. Additionally the jumpdiffusion models have what is called infinite variation, in that the sum of absolute price moves is infinity in any interval and one must square these moves before their sum is finite (the property of finite quadratic variation) while the processes we advocate are of finite variation. Unlike jump-diffusions, our processes model price up ticks and down ticks separately and the price process can be decomposed as the difference of two increasing processes representing the increases and decreases of 105

106

D. B. Madan

prices. We shall also demonstrate that the finite variation property of the proposed models also enhances their robustness and thereby their relevance for economic modeling. This chapter summarizes the findings of research that I have conducted over the past 15 years in collaboration with a number of coauthors. The research is still on going with a number of new and interesting developments already in place, but we shall focus attention on what has been learned to date. The papers that are summarized here are Madan and Seneta (1990) , Madan and Milne (1991), Madan, Carr and Chang (1998), Carr and Madan (1998), (1998), Geman, Madan and Yor (2000), Bakshi and Madan (1998a,b).1 The case for purely discontinuous price processes is, as it should be, an argument with many facets. First we summarize the empirical findings on the study of both the statistical and risk neutral processes and observe the empirical need to consider discontinuous processes as relevant candidates. Statistical reality by itself, however, is not a convincing argument. Unsupported by a theoretical understanding of market fundamentals, statistical modeling is at best a spurious coincidence. One must consider the implications of a fundamental economic analysis. We show that economic analysis with the help of some deep structural mathematical results points in the same direction: the use of purely discontinuous price processes. Statistical reality and theoretical conviction are ultimately no match for success. If the wrong model is brilliantly successful in delivering results, while the right one is relatively barren then we have little choice but to work with the incorrect model, bearing in mind its limitations. To address this concern we present some of the successes of modeling with a purely discontinuous price process. We match the success of Brownian motion in option pricing and portfolio management with the success of the purely discontinuous VG process obtained on time changing Brownian motion by a gamma process. The improvement in option pricing is clear, eliminating the implied volatility smile in the strike direction, and we are able to go further in portfolio management and study the optimal management of portfolios of derivative securities, a question that is relatively untouched in the diffusion context. In fact we successfully calibrate observed derivative portfolios as optimal and employ revealed preference methods to infer what we call the position measure but is better known as the personalized state price density. The perspective of purely discontinuous price processes, we conclude, is not only correct from a statistical and theoretical viewpoint, but is also rich in results and interesting applications. The statistical findings we summarize confirm from a variety of perspectives that the local motion of the stock price is not Gaussian. This is true of both 1 The last of these papers is a working paper and can be obtained from my web site: www.dilip-madan.com.

4. Purely Discontinuous Asset Price Processes

107

the time series of moves and the pricing distribution of moves as reflected in option prices. Apart from these standard tests of normality we also consider the behavior of extremal events. Relying on asymptotic laws of maxima and minima of independent sampled observations (see Embrechts, Kluppelberg and Mikosch (1997)), we employ long time series of returns and reject the hypothesis that asset return distributions are locally Gaussian. They lie in the domain of attraction of the Fr´echet distribution that includes the log gamma formulation of the VG process. Additionally we investigate empirically the relationship between arrival rates of jumps of different sizes with the jump size. The focus of our attention is on whether arrival rates display a monotonicity with respect to size, decreasing as the size rises, and whether the assumption of an infinite arrival rate is supported by a casual analysis of arrival rates. We conclude in favor of infinite and decreasing arrival rates. From a theoretical perspective, we concentrate on the implications of no arbitrage, a property that is fundamental to all models for the asset price process. This property is shown to imply that asset prices in continuous time must be modeled by a time changed Brownian motion. The question at issue is then the nature of the time change. We investigate whether the time change could be continuous, with the resultant implication of the continuity of the price process, and show that this is possible only in economies where returns are locally Gaussian and time is locally deterministic and non-random. Given the overwhelming evidence on the lack of a locally Gaussian return distribution we are led to entertain the lack of continuity of the price process. This modeling choice is also consistent with observations on studying the relationship between time changes and economic activity, whereby we learn that time changes are related to some measure of the rate of arrival of orders or trades. As the latter have a random element, and are not locally deterministic, this suggests that such properties are inherited by the time change and hence once again we are led to the class of discontinuous price processes. Within the class of discontinuous processes we begin our search by focusing attention in the first instance on processes with identical and independently distributed increments: a property shared with Brownian motion, the base model for the underlying uncertainty in the continuous case. This leads naturally via the L´evy–Khintchine theorem for such processes to considering L´evy processes characterized by their L´evy densities whose empirical counterparts are precisely the relationship between arrival rates of jumps of different sizes and the jump size noted earlier in our empirical analysis. When the L´evy density integrates the absolute value of the jump size in the neighborhood of zero, a case we restrict attention to, the process has finite variation and can be decomposed into the difference of two increasing processes that constitute our models for the price up and down ticks. We suggest this model as a partial equilibrium model that clears market buy orders with

108

D. B. Madan

an up tick price response as the order is cleared through the limit sell book. The converse being the case for market sell orders cleared through the limit buy book at a price down tick. An alternative and interesting economic model for price responses goes back to traditional dynamic models of price adjustment that represent the rate of adjustment as a function of the level of excess demand in the economy. We term this function relating the rate of change of prices to excess demand, the force function of the economy. Modeling excess demand by Brownian motion we may write the price process as the difference between price increases occuring during positive excursions of Brownian motion less the cumulated decreases that occur on negative excursions of Brownian motion. Such a price process is of course open to arbitrage by trades that reverse themselves during a single excursion of Brownian motion. For example, on a single positive excursion, one buys at a price and then sells at a higher price in the same excursion. To avoid such arbitrage, we restrict equilibrium trading to equilibrium times by requiring these to occur at the zero set of Brownian motion. This is organized by evaluating the disequilibrium price process at the inverse local time of Brownian motion. The resulting price process inherits the property of being purely discontinuous from inverse local time, and the process is the difference of two increasing processes that cumulate price responses during positive and negative excursions. The two models of discontinuous price processes, (i) L´evy processes and (ii) integrals of force functionals of Brownian motion to inverse local time, are surprisingly related under the hypothesis of complete monotonicity of the L´evy density.2 Every force function has associated with it a completely monotone L´evy density and for every completely monotone L´evy density there exists an equivalent representation of the price process using a force function. The equivalence is however a consequence of some deep results from number theory and hence the surprise. We also consider the issue of robustness of the economic model with respect to tolerance of a heterogeneity of views on parameters and observe that the property of bounded variation in the price process is critical for delivering such robustness. Our concern in robustness with respect to views on parameters is that different beliefs should naturally allow for different probabilities, but the probabilities should remain equivalent and not become singular. With infinite variation there are many cases where a change in certain parameters induces singularity of measures. With the theoretical and statistical foundations in sufficient harmony, and two broad classes of models outlined in sufficient detail, we turn our attention to the 2 The L´evy density is completely monotone if each of its two halves on the positive and negative side have

the property of sign alternating derivatives or equivalently can be expressed as Laplace transforms of positive functions on the positive half line. Hence, they are essentially mixtures of exponential densities.

4. Purely Discontinuous Asset Price Processes

109

study of particularly rich examples in this class of models. The basic generalization of geometric Brownian motion we introduce is the VG process that introduces two additional parameters providing control over skewness and kurtosis. The model arises on evaluating Brownian motion with drift at a random time given by a gamma process. The volatility of the gamma process provides control over kurtosis while the drift in the Brownian motion before the time change controls skewness. We show that this model is successful in option pricing, eliminating the smile in the strike direction with relative ease. Fundamental to the world of purely discontinuous price processes is the property of options being market completing assets with a genuine role to play in the economy and a natural demand for these assets by investors. Recognizing these properties, we reconsider the problem of optimal derivative investment in continuous time, keeping in place Mertonian (1971) objective functions for the investor but expanding the asset space to include all European options on the underlying stock for all strikes and maturities. We find that for HARA utilities and VG statistical and risk neutral measures the derivative investment problem may be solved in closed form and leads in such economies to a healthy demand for at-the-money short maturity options: precisely the options with the greatest liquidity in financial markets. One may view the Black–Scholes economy as teaching us about stock delta positions in option hedging, while the first lessons of investment in purely discontinuous high activity price processes are about positioning in short maturity at-the-money options. With some courage we consider replicating actual trader derivative positions as optimal ones, allowing in the process adjustments in the level of risk aversion in power utility and a view on subjective kurtosis that may differ from the statistically observed kurtosis level. Kurtosis is particularly hard to estimate as its variance is of the order of the eighth moment. With this two dimensional flexibility, we are amazingly successful in many instances in calibrating actual spot slides as optimal wealth responses from the perspective of our continuous time optimal derivative investment model.3 Having inferred risk aversion and the characteristics of subjective probability consistent with replicating observed positions as optimal, we may construct the personalized state price density that values options at a dollar amount yielding a marginal utility that matches the future expected marginal utility from holding the option. We call this state price density the position measure and provide explicit constructions of position measures, contrasting them with the risk neutral and statistical measures. We find generally that position measures are closer to the statistical measure and lie between the statistical and risk neutral measure. This is consistent with the view that traders are aware of relative frequency of 3 The spot slide of a derivatives book graphs the value of the book as a function of the level of the underlying,

typically varying the underlying in the range plus or minus 30% of spot for equity assets.

110

D. B. Madan

occurence of market moves and their prices and accordingly make markets in option contracts. The outline for the rest of the chapter is as follows. Section 2 presents a summary of the statistical results. The economic consequences of no arbitrage are described in section 3, while the two equivalent but apparently different economic models of the price process are summarized in section 4. The task of constructing specific examples consistent with the statistical and economic observations of these sections is taken up in section 5. The basic operating model of the VG process is introduced in section 6. Its successes in option pricing are summarized in section 7. Optimal solutions to the asset allocation problem with derivatives are presented in section 8 and employed to infer position measures in section 9. Section 10 concludes.

2 Properties of the price process This section summarizes some of the broad properties of the statistical and risk neutral price process. We address issues related to the normality of the motion, the behavior of extreme moves and the shape of the density of arrival rates of price moves. The emphasis in all cases is on the movement over short horizons as we view the macro moves as cumulated short moves.

2.1 Long-tailedness of historical returns We begin by considering some well known results about the long-tailedness of the statistical return distribution and standard chi-square goodness of fit tests of normality of the return distribution. Early results on these issues go back to Fama (1965) where both the independence of daily returns and their long-tailedness is documented. We now have data at much higher frequencies of observation and report in Table 1 results on S&P 500 futures returns at these frequencies. We focus attention on the level of the observed kurtosis and on χ 2 goodness of fit tests for normality. We observe from Table 1 that the kurtosis is substantially higher than three, the kurtosis level of a normal distribution. The goodness of fit tests also overwhelmingly reject the hypothesis of normality for returns over short durations. We will note later, in the next section, that this has very significant implications for modeling the dynamics of the price process.

2.2 Long-tailedness in risk neutral distribution Apart from the statistical return distribution we are also interested in the risk neutral or pricing distribution as implied by option prices. This distribution assesses the

4. Purely Discontinuous Asset Price Processes

111

Table 1. High frequency tests of normality S&P 500 Futures Returns Nov. 1992–Feb. 1993.

Kurtosis χ 2 test statistic χ 2 critical value 5%

1 Min.

15 Min.

Hourly

Daily

58.59 437.12 9.26

13.85 931.85 5.7

5.97 98.323 3.57

10.31 123.84 0.989

Source: Dissertation of Thierry An´e, University of Paris IX Dauphine and ESSEC 1997.

futures price of a binary derivative that pays a dollar at a future date if the stock price is in a certain interval, as opposed to the likelihood of the occurence of this event. The distribution may be recovered from observed option prices with the density being given by the second derivative of the European call option price, of maturity matching the future date, with respect to the option strike as derived in Ross (1976a) and Breeden and Litzenberger (1978). If the distribution describing the current prices of derivatives written on future stock price events is Gaussian then an implication is that the implied volatility obtained from equating the option price to the value given by the Black–Scholes formula, should be constant as one varies the strike for a fixed maturity. On the other hand, if this density is symmetric about a point, then the implied volatilities, though no longer necessarily flat with respect to strike, should be symmetric about a point as well. Both these implications are contradicted by what has come to be known as the implied volatility smile. We present in Table 2 below, the implied volatility smile on S&P 500 index options, based on out of the money options using only puts for strikes below, and calls for strikes above, the spot price. These are the more liquid option markets. The time period covered is June 1988 to May 1991 and we focus attention just on the short maturity options. The choice of this focus is motivated by our intention of studying the dynamics of the stock price process, which is but the cumulation of short maturity moves. We observe from Table 2, reading up the columns, that as the strike level rises, the implied volatility falls sharply followed by a smaller rise as one crosses the level of the spot price. We therefore clearly have a smile shape in the short maturity implied volatility, but the left and right sides are not symmetric. We may conclude from these observations that the left tail of the pricing distribution is fatter than the right tail, and this reflects a negative skewness in the distribution. The existence of the smile itself is evidence of excess kurtosis (relative to the normal distribution) in this density.

112

D. B. Madan

Table 2. The smile in implied volatilities at shorter maturities below 60 days. Moneyness spot/strike

June 1988– May 1989

June 1989– May 1990

June 1990– May 1991

1.06

17.27 16.21 16.33 17.42 19.04 21.84

16.16 15.10 15.83 17.81 20.65 25.70

19.70 18.23 18.65 20.87 22.27 25.57

Source: Bakshi, Cao and Chen, Journal of Finance (1997), page 2015.

2.3 The behavior of extreme moves Tables 1 and 2 are classical results on the statistical properties of densities associated with price movements in financial markets. They summarize essentially the narrow behavior of the return distribution as may be evidenced by noting that most of the returns considered in the time series analysis are the ones with the smaller magnitudes, and the range of moneyness reported in the implied volatility curves is just within six percentage points over an average period of a month. Hence the evidence presented is that of lack of normality in the neighborhood of the zero return and one might wonder whether at least the tail of the distributions is Gaussian. For the risk neutral distribution this has the implication that the implied volatility curve flattens out as one gets into deep out-of-the-money options on both sides, though the level at which the curves flatten out may be different on each side. To focus attention on the behavior of the tails of the distribution with a view to addressing whether this may be Gaussian, we consider the behavior of extremes. It is shown in Embrechts, Kluppelberg and Mikosch (1997) that the asymptotic distribution of the maximum and minimum of independent drawings from a Gaussian distribution is given up to shift and scale by the Gumbel distribution. The other possible asymptotic distributions for these extremal events are, again up to shift and scaling, the Weibull and Fr´echet distributions. For distributions that have as support the positive half line, the candidate limiting distributions are just the Gumbel and Fr´echet distributions. The analysis of extreme events requires long time series of data and for this purpose we obtained data on daily returns on the Dow–Jones industrial average (DJIA) for 100 years from 1897–1997. Partitioning this data into non-overlapping intervals of 100 days, we constructed a series on the maximum percentage daily rise and the maximum percentage daily drop in the DJIA over the 100 days. We

4. Purely Discontinuous Asset Price Processes

113

Table 3. Log-likelihoods of the distribution of extremal price movements maximum daily percentage rise and fall in the DJIA over 100 day nonoverlapping intervals for 100 years. Maximum daily drop 100 days Gumbel Fr´echet P-value 1897–1997 1897–1945 1946–1997

768.37 380.22 409.93

808.58 389.98 434.74

0.00 0.01 0.00

Maximum daily rise 100 days Gumbel Fr´echet P-value 1897–1997 1897–1945 1946–1997

811.66 395.79 358.33

833.77 408.92 432.95

0.01 0.01 0.01

Source: Bakshi and Madan (1998), What is the probability of a stock market crash, working paper, University of Maryland.

then artificially nested the Gumbel and Fr´echet log likelihoods and tested the null hypothesis that the distribution of the extreme event is Gumbel, the limit of the Gaussian tail. Table 3 presents these results. Table 3 demonstrates that the normality hypothesis may also be rejected as a model for the tails of the statistical distribution of daily returns. Given the evidence on excess kurtosis, we would conjecture that these tails are heavier than Gaussian and if the property is shared with the risk neutral distribution, as we suspect it is, then implied volatilities must continue to rise as we get deeper out-of-the-money, i.e., the implied volatility curves do not flatten out at either end of the strike range. At this point we do not have documentary evidence on very deep out-of-the-money implied volatilities but observations from current market quotes on S&P 500 index options would suggest that this may well be the case.

2.4 The structure of the arrival rates of price moves The arguments of this chapter lead us to considering as models for the dynamics of stock prices, purely discontinuous processes. Such processes, when they have independent and identically distributed increments, are characterized by their L´evy densities that essentially count the rate of arrival of jumps of different sizes. These are a wide class of processes, and structural properties if supported by data are beneficial in limiting the class of models that need to be considered. One such structural property is complete monotonicity of the L´evy density, whereby large

114

D. B. Madan

jumps occur at a smaller rate than small jumps. This is a reasonable property to expect as market participants facing price increases on buy orders and decreases on sell orders have an incentive to minimize these impacts. Another structural property is the aggregate arrival rate of jumps or moves, that could be finite or infinite. We note in this regard that Brownian motion is an infinite activity process as the actual sum of absolute price moves is itself infinite for Brownian motion as it is a process of infinite variation. We note further that jump-diffusions employ a compound-Poisson process for the arrival of jumps that have a finite arrival rate with the magnitude of jumps having, once again, a normal distribution. The models we propose in this chapter have infinite arrival rates of jumps and in this regard they are closer to Brownian motion, but unlike Brownian motion they are processes of finite variation. This requires that the integral of the L´evy density be infinite, but the density times the jump size should have a finite integral near zero. A typical L´evy density meeting these conditions is of the form α exp(−β |x|)/ |x|1+ρ for jump size x with ρ > 0. The log arrival rate is in this case linear in the jump size and the log of the jump size, with the coefficient on the log of the jump size being above unity. For ρ > 1 we have infinite variation and ρ = 0 is the case of the gamma process, or in this case the difference of two gamma processes which we will note later is the VG model. On the other hand if the jump sizes are exponentially distributed with a finite arrival rate, as postulated for example in Das and Foresi (1996) then the log arrival rates are linear in just the size with the coefficient on log size being 0 or ρ = −1. In contrast the log arrival rate of the compound-Poisson process with Gaussian jump sizes (see Cox and Ross (1976)) is linear in the size and the square of the size. Since the exponential of a negative quadratic shifts from being concave near zero to convex near infinity, such a L´evy density is not completely monotone. A cursory evaluation of these structural properties may be simply made by regressing log arrival rates on the size of jumps, their log and their square. For our 100 year data on daily returns on the DJIA we counted the number of arrivals of jumps in the different size categories and then regressed the log of the empirically observed arrival rate on the size of the jump, its log and its square. For the Cox and Ross (1976) model the log arrival rates have a single representation that is not distinguished by the sign of the jump, while for the Das and Foresi and VG type models, the parameters vary with sign, so the latter two model estimates allow for this by separating out the positive and negative moves. Table 4 presents the results of these regressions. From Table 4 we observe that the coefficient of log size in the first two regressions is significantly different from zero and may even be close to two, which definitely argues against a process with a finite arrival rate, as in Das and Foresi (1996). As in a number of cases the coefficient is estimated above two, the process

4. Purely Discontinuous Asset Price Processes

115

Table 4. Regression of log arrival rates on the sizes of jumps. Standard errors are in parentheses. Log arrival rates of drops Constant Jump size Log size

R2

1897–1997

−9.88

−31.6

−1.92

0.97

1897–1945

−8.51

−33.0

−1.65

0.97

1946–1997

−12.35

−32.0

−2.41

0.95

(1.44) (1.45) (2.22)

(8.36) (8.53)

(17.78)

(0.32) (0.32) (0.45)

Log arrival rates of rises Constant Jump size Log size

R2

1897–1997

−11.55

−24.5

−2.25

0.96

1897–1945

−10.29

−25.4

−1.99

0.97

1946–1997

−13.66

−25.8

−2.67

0.93

(1.71) (1.65) (3.23)

(9.10) (8.97)

(24.45)

(0.38) (0.37) (0.65)

Arrival rates for jump diffusion Constant Jump size Size2

R2

1897–1997

−3.66

−1.73

−447

0.70

1897–1945

−3.36

−1.77

−421

0.71

1946–1997

−3.17

1.54

−928

0.64

(0.53) (0.48) (0.65)

(3.86) (3.66) (8.98)

(66) (62)

(191)

Source: Bakshi and Madan (1998), What is the probability of a stock market crash, working paper, University of Maryland.

may be one of infinite variation. However, we cannot reject the hypothesis that this coefficient is below two and hence we may have a process of finite variation. As will be argued later, there are other reasons for entertaining a finite variation process and in the absence of strong evidence to the contrary we conclude in favor of finite variation processes with infinite arrival rates. Regarding the comparison with the Cox and Ross (1976) process with quadratic log arrival rates, we note that the linear term is in all cases insignificant, suggesting a pure quadratic model, but note further that one explains only up to 70% of the variation in arrival rates compared with up to 97% of the variation using the completely monotone density.

116

D. B. Madan

2.5 Summary of empirical observations We note from Tables 1 and 2 that both the statistical and risk neutral distributions are for short intervals, not normal distributions. They have significant levels of excess kurtosis and the risk neutral distribution in particular is also skewed to the left with a heavier left tail than a right tail. This absence of normality continues into the tail of the densities as reflected by an analysis of extremes in Table 3. From Table 4 we infer that a reasonable model could be a pure jump model with an infinite arrival rate – L´evy density integrating to infinity – and a process of finite variation. We also infer from Table 4 some support for a completely monotone L´evy density. Heavy risk neutral tails, if confirmed, imply that implied volatilities are strictly U -shaped and do not flatten out as one moves deep out of the money in both directions.

3 The implications of economic theory One of the most far reaching implications of economic theory are now recognized to be the consequences of the no arbitrage hypothesis. From early beginnings with the Ross’ (1976) theory of arbitrage, and its application to option pricing by Black and Scholes (1973) and Merton (1973) to the development of the martingale theory of pricing by Harrison and Kreps (1979) and Harrison and Pliska (1981) this hypothesis has yielded many deep and interesting results. We demonstrate in this section a continuation of these lessons and draw out more exactly the implications of this hypothesis for modeling the dynamics of the asset price. Before proceeding we note an important proviso with regard to this hypothesis. Financial markets may display arbitrage opportunities and there are many documented “so-called” anomalies that are suggestive of such a possibility, yet it remains true that models of the price process to be employed in developing derivative pricing models must be free of arbitrage. This is so for the simple reason of preventing traders from arbitraging a firm quoting arbitrageable prices. That models must be arbitrage free goes without question.

3.1 The stochastic process implications of no arbitrage Four results, one from mathematical finance and the other three from the theory of stochastic processes, form the foundations for the stochastic process implications of the hypothesis of no arbitrage. The first of these results, from mathematical finance, demonstrates that the absence of arbitrage is equivalent to the existence of an equivalent martingale measure. The other results, from the theory of stochastic processes, characterize martingales.

4. Purely Discontinuous Asset Price Processes

117

3.1.1 No arbitrage and martingales This result has many proofs or no proof depending on the context and meaning to be attached to the idea of no arbitrage. In discrete time and with finitely many states there is no ambiguity and the result is true with a proof going back to Harrison and Kreps (1979). At the other extreme we have continuous time and states given, at a minimum, by the relatively large set consisting of the paths of the stock price process. Here the existence of martingale measures easily implies the absence of arbitrage, but the implication in the reverse direction is not available, and this is the direction that concerns us here. Essentially the hypothesis of no arbitrage, merely asserting that one cannot combine a portfolio of existing assets to earn a non-negative, non-zero, cash flow at a negative current price is too weak to deduce the existence of a martingale measure. For interesting counterexamples of economies satisfying no arbitrage and yet not satisfying the existence of a martingale measure the reader is referred to Jarrow and Madan (1998). In these richer contexts allowing an infinity of dynamic trading strategies, the hypothesis of no arbitrage must be strengthened to permit deduction of a martingale measure. The strengthening required is topological in nature and requires that one not be able to construct an approximation to an arbitrage opportunity in some limiting sense, and then it does follow that there exists an equivalent martingale measure. The first results in this direction are due to Kreps (1981). The difficulty with the result of Kreps (1981) is the weak sense in which the limit is taken, as the definition of approximation lacks a sense of uniformity, and what is regarded as an approximation may not be so from the perspective of other economic agents. The strongest results in this direction are due to Delbaen and Schachermayer (1994). They employ a strong and uniform sense of no arbitrage and show that if there is no random sequence of zero cost trading strategies converging in this strong sense to a non-negative, non-zero cash flow, with the random sequence being uniformly bounded below by a negative constant, then there exists a martingale measure and the converse holds as well. They term this hypothesis No Free Lunch with Vanishing Risk (NFLVR) and prove that it is equivalent to the existence of an equivalent martingale measure. 3.1.2 Martingales and semimartingales The second important result in ascertaining the stochastic process implications of the hypothesis of no arbitrage is Girsanov’s theorem. This is pointed out by Delbaen and Schachermayer (1994) and amounts to noting that if there exists a change of measure from the true statistical measure P to a martingale measure or risk neutral measure Q such that under Q discounted asset prices are martingales, then it must be that under P the price process was a semimartingale to begin with.

118

D. B. Madan

This is a very useful realization as it informs us that models for price processes may safely be restricted to the class of semimartingale processes. Since the class of semimartingales is very wide indeed, one might argue that this is not a very important insight. On the other hand, a lot is known about the structure of semimartingales and for a modeler it is useful to know that the search may be constrained by this structure. Some recent examples of proposals for stock price processes that are not semimartingales include the use of fractional Brownian motion with the arbitrage demonstrated in Rogers (1997). Semimartingales are a difficult concept to communicate in precision, as they go beyond the idea of a simple concept and are in fact a fairly complete and very general theory of random processes, yet given their established importance to the field of mathematical finance today, it is imperative that we communicate some of the flavor of this theory, and do so with brevity. There are at least two approaches, one analytical and the other structural and it is best to consider the structural approach. From this perspective a semimartingale is described by its decomposition into a martingale plus a very general model for the drift of the process. This certainly includes linear drift but also more general models of the drift. One merely requires that this process be of finite and integrable variation, as well as being predictable (i.e. the limit of left continuous functions). Examples include Brownian motion with drift, solutions to stochastic differential equations like the mean reverting Cox, Ingersoll and Ross (1985) interest rate process and the VG model (Madan, Carr and Chang (1998)) with drift to be discussed later in the chapter. To appreciate what is not a semimartingale, we consider the discrete time continuous state context studied by Jacod and Shiryaev (1998) where they show that the no arbitrage property is lost if zero is not in the relative interior of the support of the multivariate return distribution over the discrete time step and hence the arbitrage. We also learn from this paper that not all semimartingales are stock price models, as calendar time is a semimartingale with a zero martingale component and has arbitrage if it was a price process. The important property is to get zero into the relative interior of the support, at least in discrete time. Price processes must be semimartingales with a non-zero martingale component. 3.1.3 Semimartingales and time changed Brownian motion The next result we employ in developing our understanding of the stochastic process implications of no arbitrage is a fundamental characterization of all semimartingales, due to Monroe (1978). This remarkable result shows that every semimartingale can be written as a Brownian motion (possibly defined on some adequately extended probability space) evaluated at a random time. This result is somewhat surprising at first, since Brownian motion, even if evaluated at a random time, is suggestive of a martingale and as noted earlier semimartingales include

4. Purely Discontinuous Asset Price Processes

119

simple linear drifts like time itself. However, this is only a problem at first glance as the time change need not be independent of the Brownian motion and calendar time t, for example, is Brownian motion W (t) evaluated at the first time T (t) at which this same Brownian motion reaches t. By this result the study of price processes is reduced to the study of time changes for Brownian motion and one may consider both independent and dependent time changes. One might ask what the time change represents? Ignoring price changes that are the possible result of noise or liquidity trades, changes in the price of an asset occur through trades motivated primarily for reasons of information. The cumulated arrival of relevant information is a reasonable, economically meaningful measure of the time change, that gets translated into buy or sell orders. Geman, Madan and Yor (2000) consider many models for the process of buy and sell orders and relate the time change in all these cases to some measure of economic activity. In some cases the measure is just the number of trades while in other cases time is measured by the weighted sum of order arrivals, where the weights vary with the size of the order. When time is viewed in this economically fundamental manner the question of dependence or independence of the time change becomes an interesting and meaningful question. Certainly, some part of the order process and hence the time change, one would expect, is motivated by observations of the price process. This is the phenomenon of herding or runs on the asset. On the other hand if the market is dominated by independent analysts who view the market price as always providing us with the most efficient and accurate valuation of the asset, i.e. it is a discounted martingale under the right measure, then there is no information to be extracted from prices that the market has not already extracted and so no analysts are motivated in their trades by observations of price movements. They are bound to seek independent, and as far as possible, private information, as the motivating basis of their trading decisions. This interpretation of the process suggests an independent time change. We also note that from a mathematical modeling viewpoint, it would be easier to work with independent time changes though it is possible and we shall see cases where both representations are possible for the same process. Generally, the independent time change is the more tractable alternative and so far most of our successes come from processes of this type. The broad consistency of this hypothesis with the efficient markets hypothesis is therefore an attractive feature. 3.1.4 Continuous time changes and semimartingales We come now to the crux of the issue, the continuity of the price process or otherwise. This brings us to the third and final result from the theory of stochastic processes shedding light on the nature of the price process as a consequence of no arbitrage. We note first that as the price process is a time changed Brownian

120

D. B. Madan

motion, it will be a continuous process essentially only if the time change is continuous. The implications of supposing such continuity in the time change rely on results characterizing continuous semimartingales (Revuz and Yor (1994), page 190). Let X (t) be a continuous semimartingale, be it the price process or the time change. Let V (t) be the quadratic characteristic of the semimartingale X (t) which exists by virtue of X being a semimartingale. In the terminology of Wall Street the process V (t) is akin to the realized total variance on the process X (t). If the process X (t) has a well defined sense of a variance rate per unit time, or equivalently V (t) is differentiable in t then the quadratic characteristic is absolutely continuous with respect to Lebesgue measure and in this case we may write the process X (t) as a stochastic integral with respect to Brownian motion. Under these conditions there exist processes a(t), b(t) and a standard Brownian motion W (t) such that t t a(s)ds + b(s)dW (s). (1) X (t) = X (0) + 0

0

Consider now the implications of X (t) being a time change and the price process in turn. If X (t) is a time change, then it is an increasing process and so b(t) must be identically zero. This implies that the time change is locally deterministic with no uncertainty in local rate of time change which is then a(t). If we view the time change, as suggested earlier, as a measure of economic activity, proxied by the rate of arrival of information, orders, or size weighted orders then one would expect some local uncertainty in the time change and this argues against the use of a locally deterministic time change and hence, by implication, a continuous semimartingale as a model for the price process. On the other hand if one views X (t) directly as a price process, the representation (1) argues that the local motion of the stock return must be Gaussian. Given the considerable evidence cited against the likelihood of this possibility, we conclude once again that a continuous semimartingale is not an appropriate model for the price process. Now it is possible that there is a continuous martingale component in the price process in addition to a jump component as is the case of jump diffusions, but the necessity of introducing such a diffusion term onto a functioning purely discontinuous model must be separately argued for. As we will observe, the latter class of models contain many alternatives capable of approximating very closely the structural characteristics of diffusions. 3.1.5 Summary of the consequences of no arbitrage We showed in this section that no arbitrage implies, via the existence of an equivalent martingale measure, that the price process is a semimartingale. We then observed that all semimartingales are time changed Brownian motions, time changed

4. Purely Discontinuous Asset Price Processes

121

by a random increasing time change. The resulting process could be continuous only if the time change is locally deterministic. Relating time changes to measures of economic activity with some local uncertainty we argued that the price process was not a continuous process. We also observed that such continuity implies that the process is locally Gaussian, for which we have ample evidence to the contrary, and so once again we concluded that the process cannot be continuous. The remaining sections will take up the issue of modeling using purely discontinuous processes and demonstrate their effectiveness. The need to add on an additional continuous process onto a functioning purely discontinuous process must in our view be argued for on theoretical and empirical grounds. Carr, Geman, Madan and Yor (2000) present evidence to the contrary.

4 Economic models of finite variation for asset price processes Statistical and economic analysis suggests that we entertain purely discontinuous price processes with possibly infinite arrival rates, and finite variation. An attractive feature of finite variation processes is that they may be decomposed as the difference of two increasing processes, a property lost in Brownian motion and other processes of infinite variation. This permits, for the first time, a separation of the price process into the process of up ticks and down ticks. Our analysis of optimal contracting in such economies indicates that the major demand for short maturity at-the-money options in such economies arises from a desire on the part of investors to be positioned differently with respect to upward and downward movements in the market, a position not attainable by direct stock investment alone. Hence options, and short maturity at-the-money options in particular, play a fundamental role in such economies: a role that may be consistent with casual observations of high activity in these markets. The next step forward from correctly adjusting one’s delta or stock position is the optimal positioning of the up and down deltas via option trades. To effectively answer these questions it is imperative that we focus attention, separately, on the up and down forces of the market. We propose here two classes of models, accomplishing this objective. The models differ in their primitives and are structurally distinct, yet we show in the next section that under some fairly reasonable conditions, they are in fact equivalent. However, tractability is enhanced by working with both specifications as it can be difficult to find the equivalent formulation from the alternate perspective. The first class of models takes as primitives two increasing processes that represent cumulated orders to buy and sell at market and models the price responses as these orders are cleared through the limit sell and buy books respectively. Economic activity and the related concepts of economic time reflect cumulated orders

122

D. B. Madan

of both types in this representation of the price process. We term this class of models the Order Processing Models (OPM). The second class of models is related to traditional models of dynamic price adjustment with price changes expressed as a function of the level of excess demand in the economy. This response function is termed the force function of the economy as it measures price pressure in its relationship with excess demand. The excess demand itself is modeled by a Brownian motion with the equilibrium points given by the zero set of Brownian motion. Economic time in these models is given by cumulated squared price responses or the realized variance. This class of models we refer to as Dynamic Price Adjustment Models (DPA).

4.1 Prices in the order processing model (OPM) The primitives in this view of the price process are two increasing processes that represent cumulated market buy orders, U (t), and cumulated market sell orders V (t). We have noted in our discussion of time changes that increasing random processes with local uncertainty are necessarily purely discontinuous. By taking as primitives such increasing random processes, the fundamental uncertainties of the economy are discontinuous and prices modeled as market responses to such inherit this property. Defining the jumps in the processes U (t) at time t by U (t) = U (t)−U (t ) where we note that the processes are by construction right continuous with left limits and U (t) = lims↓t U (s) while U (t ) = lims↑t U (s) and likewise for V (t), V (t ) and V (t). The property of being increasing and purely discontinuous implies that U (t) =

U (s)

s≤t

V (t) =

V (s)

s≤t

so that the current value of each process is just the sum of all the jumps that have occured to date. Price changes are modeled in Geman, Madan and Yor (2000) by market responses to these market buy orders. Here we describe the process of price increases. The magnitude U (t) is viewed as a buy order at the prevailing price of p(t ) which by construction cannot be accessed. There is a downward sloping demand curve q du ( p(t)/ p(t ), U (t), t) that is U (t) at p(t) = p(t ) and an upward sloping supply curve q su ( p(t)/ p(t ), U (t), t) that is zero at p(t) = p(t ) that must be equated to determine both the quantity transacted q u = q du = q su and

4. Purely Discontinuous Asset Price Processes

123

the price response p(t). The solution gives the price response in log form by p(t) = "u (U (t), t). ln p(t ) A similar analysis yields the price response to a market sell order p(t) = "v (V (t), t). ln p(t ) The price process is obtained as an aggregation of the price responses to market buy and sell orders "u (U (s), s) − "v (V (s), s) ln( p(t)) = ln( p(0)) + s≤t

s≤t

and is by construction the difference of two increasing processes, and therefore a finite variation process. It is also purely discontinuous in that it is precisely the sum of all its jumps. Geman, Madan and Yor (2000) rewrite such processes in many cases as time changed Brownian motion and study the relationship between the time change and the market primitives, showing that the time change is generally a size weighted sum of the market buy and sell order processes. Hence their interpretation as measures of the level of economic activity.

4.2 The dynamic adjustment model (DPA) This formulation of the price process begins with a traditional price adjustment model of the form d ln( p) = f (z(t)) dt where z(t) is a measure of excess demand and f represents the force by which prices respond to excess demand in the economy. This function we term the force function of the economy. By construction f (x) ≥ 0 for x > 0 and f (x) ≤ 0 for x < 0. Excess demand is exogeneously modeled as dominated by new information and is given by a Brownian motion W (t). It follows that t ln( p(t)) = ln( p(0)) + f (W (s))ds. 0

Equilibrium times are of course given by the zero set of Brownian motion and there are arbitrage opportunities to be made during upward or downward rallies by buying or selling and then reversing the trade before the end of the rally. Such intra rally trades are not available to general market participants whose price access is only at equilibrium times. The restriction to equilibrium times, the zero set of

124

D. B. Madan

Brownian motion, is accomplished by evaluating the above process at the inverse local time of Brownian motion at zero, σ (t). We therefore define σ (t) ln( p(t)) = ln( p(0)) + f (W (s))ds. (2) 0

This process is once again a purely discontinuous process, inheriting this property from that of inverse local time. It may be decomposed as the difference of two increasing processes σ (t) σ (t) ln( p(t)/ p(0)) = f + (W (s))ds − f − (W (s))ds 0 +

0

−

where f (x) = f (x)1(x≥0) ; f (x) = f (x)1(x≤0) , and is a process of finite variation under the condition

K −K | f (x)| d x < ∞ for all K . It is interesting to enquire into the nature of the force function in the economy. For example, if f (x) > 0 for all x > 0 and f (x) < 0 for x < 0 then the price process is one with an infinite arrival rate of jumps. On the other hand there are finitely many jumps in any interval if f (x) = 0 in a neighborhood of zero. Another interesting question is whether the force is immediately infinite and decreasing for larger excess demands or whether it rises with the level of excess demand. Geman, Madan and Yor (2000) present many explicit solutions that may be employed to answer such questions. They also show that such a process may be written as Brownian motion evaluated at a time change that aggregates the squared price responses and is thereby a measure of realized variance.

5 Prices as L´evy processes Finite variation asset price processes are by construction the difference of two increasing processes and section 4 has described two classes of economic models that give rise to such processes. We now wish to construct specific examples of such processes that may be evaluated empirically in their adequacy as models for the statistical dynamics of the price process, and as models for the pricing densities reflected in option prices. This statistical evaluation is enhanced if one has effective descriptions of the transition densities for use in maximum likelihood estimation and closed form or otherwise fast and accurate computation methods for the prices of European options when the underlying process is in the described class. Both these objectives are simultaneously met by an analytic closed form for the characteristic function of the log of the stock price at a future date. The density is then easily evaluated by Fourier inversion and maximum likelihood estimation

4. Purely Discontinuous Asset Price Processes

125

is feasible, alternatively one may also follow the methods outlined in Madan and Seneta (1989) and estimate parameters by maximum likelihood on transformed variates. Option prices are easily obtained from the characteristic function and this is described in Bakshi and Madan (1998) and a faster algorithm is provided in Carr and Madan (1998). Carr and Madan show how to analytically write the Fourier transform in log strike of an exponentially damped call price, in terms of the characteristic function of the log stock price. The damped call price and call price are then obtained by a single Fourier inversion that may even invoke the fast Fourier transform. The characteristic function of the log stock price is therefore seen as the key to efficient model validation from both a statistical and risk neutral perspective.

5.1 The characteristic function of log price relatives In constructing alternatives to Brownian motion as models of the fundamental uncertainty driving the stock price, that may meet our requirements of being a purely discontinuous process of finite variation with a possibly infinite arrival rate of shocks, we focus in the first instance on keeping all the properties of Brownian motion except those that must be given up. We are well aware that just as more complex models allowing for stochastic volatility and correlations of various sorts can be constructed out of Brownian motions by combining them in various ways, the same can be done with any candidate process that replaces Brownian motion. The first property of Brownian motion that we seek to keep is the analytically rich property of being a process of independent increments, identically distributed over non-overlapping intervals of equal lengths of time. This introduces a homogeneity of the base uncertainty across time, that may be altered through parametric shifts in later developments. In any case, for modeling the local motion, homogeneity should be a reasonable hypothesis from at least the perspective of a local approximation that employs some average density of moves, even if the actual ones are state contingent and time varying. The second property, which we may or may not keep, is that of finite moments of all orders. We are modeling continuously compounded returns and this should in principle be a bounded random variable, even if it is difficult to organize this within a modeling context, and hence the finiteness of moments is really a non-issue. Considerations of analytical tractability may on occasion require us to consider processes with infinite moments, but my priority is to avoid them as far as possible. The theory of stochastic processes has a lot to teach us about processes meeting these conditions. Such processes are called infinitely divisible and the L´evy– Khintchine theorem (see Feller (1971) and Bertoin (1996)) provides us with a complete characterization of the characteristic function. Specifically, let X (t) =

126

D. B. Madan

log(S(t)) be the continuous time process for the log of the stock price with mean µt, and further suppose that X (t) is a finite variation process of independent identically distributed increments. Then there exists a unique measure ) defined on R − {0} such that ∞ iux de f e − 1 )(d x) . φ X (t) (u) = E exp(iu X (t)) = exp iuµt + t −∞

The measure ) is called the L´evy measure of the process and X (t) is a L´evy process. When the measure has a density k(x), we may write ∞ iux (3) φ X (t) (u) = exp iuµt + t e − 1 k(x)d x −∞

and we refer to the function k(x) as the L´evy density. Heuristically the density k(x) specifies the arrival rate of jumps of size x and the L´evy process X (t) is a compound Poisson process with a finite arrival rate if the integral of the L´evy density is finite. We shall primarily be concerned with L´evy processes with an infinite arrival rate. The L´evy process may always be approximated by a compound Poisson process obtained by truncating the L´evy density in a neighborhood of zero, and using as an arrival rate λ= k(x)d x |x|>ε

and as a density for the jump magnitude conditional on the arrival, the density g(x) =

k(x)1|x|>ε . λ

The convergence occurs as we let ε → 0. Geman, Madan and Yor (2000) present many examples of candidate L´evy processes that are associated with the two economic models OPM and DPA of section 4.

5.2 Robustness of finite variation L´evy processes Continuous time processes with continuous sample paths have a certain lack of robustness best illustrated by considering geometric Brownian motion under two different but close volatilities. Two individuals could perhaps hold such different views on volatility but as a consequence their probability measures are no longer equivalent but are in fact singular. The set of paths receiving probability 1 under one measure has probability 0 under the other measure. The measures are not robust, in the sense of equivalence, to different volatility beliefs. This lack of robustness is really a consequence, not of continuity, but of infinite variation.

4. Purely Discontinuous Asset Price Processes

127

Hence, remaining in the class of finite variation processes enhances robustness of the models to heterogeneity of views on various parameters. To appreciate this point we note (Jacod and Shiryaev (1980), page 159) that when two L´evy processes with L´evy densities k(x) and k (x) are equivalent then there exists a positive measurable function Y (x) such that k (x) = Y (x)k(x) and

∞

−∞

|(|x| ∧ 1) (Y (x) − 1)| k(x)d x < ∞.

One may rewrite (5) on employing (4) as (|x| ∧ 1) k(x) − k (x) d x + (|x| ∧ 1) (k (x) − k(x))d x < ∞ k k

(4)

(5)

(6)

and observe that on the set |x| > 1 the required integrability holds by virtue of the integrability of the L´evy densities on this set. On the set |x| < 1 we have the integrability condition |x| (k(x) − k (x))d x + |x| (k (x) − k(x))d x < ∞ k k

and this condition essentially requires that the difference between the two L´evy measures be a finite variation process and holds automatically if both L´evy processes are of finite variation. Hence for finite variation processes, equivalence just requires absolutely continuity of the measures with respect to each other or the condition (4) with no integrability conditions. Restrictions on the ability to change parameters like volatility in geometric Brownian motion follow from the integrability conditions for equivalence and apply to processes with infinite variation. In this regard one may consider the L´evy measure studied in Geman, Madan and Yor (2000) of the form k(x) =

e−x for x > 0. x 2+α

For α > 0 this process has infinite variation and the parameter generating the infinite variation is α. This parameter cannot be changed if equivalence is to be preserved. Specifically, if k (x) =

e−x x 2+β

for α = β and α, β > 0 the two measures are no longer equivalent and it is the integrability condition (5) that fails.

128

D. B. Madan

5.3 Complete monotonicity (CM) There are of course many L´evy densities that one may employ in modeling the price process. It is therefore useful if the collection of possible choices can be reduced by invoking some structural properties. One such property is that of complete monotonicity. The idea is to require the arrival rates of large jumps to be less than the arrival rates of small jumps. This suggests that k(x) be decreasing in |x| or that k (x) ≤ 0 for x > 0 and k (x) ≥ 0 for x < 0. The first derivative of the L´evy density is therefore of one sign on each side of zero. The property of complete monotonicity requires that all the derivatives, and not just the first, have this property of having the same sign on each side of zero. By a result of Bernstein this property is equivalent to requiring k(x) for x > 0 to be the Laplace transform of a positive measure on the positive half line and similarly for k(x) for x < 0. Specifically we require that there exist measures G p and G n ,

∞

k(x) =

e−ax G p (da) for x > 0

0

k(x) =

∞

eax G n (da) for x < 0.

0

The L´evy density is then a mixture of exponential densities. An important result that follows for such L´evy densities is that the two classes of economic models OPM and DPA are equivalent under the CM property.

5.3.1 Equivalence of OPM and DPA under CM In particular, for every force function defining the price response under DPA, the resulting price process of equation 2 is a L´evy process with a completely monotone L´evy density. Geman, Madan and Yor (2000) give numerous examples of force functions and their associated L´evy densities. For example, if the force function is x m for some integer m > 0 then the process is one of independent stable increments with index α = (1/2 + m)−1 . Conversely, every L´evy process with such a completely monotone L´evy density can be written as the integral of a functional of Brownian motion up to the inverse local time of the Brownian motion. This equivalence result is an application of analytical results from number theory called Krein’s theory and the specification construction of the force function from the L´evy density and vice versa remains a difficult, if not impossible task. Specifically, for the variance gamma model that we introduce next, we know the L´evy density quite explicitly but are not aware of what the force function is in this case.

4. Purely Discontinuous Asset Price Processes

129

6 The variance gamma model Purely discontinuous processes of finite variation with infinite arrival rates contain a particularly tractable and parametrically parsimonious subclass of processes that is constructed from two very well known processes, Brownian motion and the gamma process. This is the “so-called” variance gamma process first studied by Madan and Seneta (1990). The process studied in Madan and Seneta (1990), was the symmetric variance gamma process that is obtained on evaluating Brownian motion at gamma time. An asymmetric risk neutral process was developed by Madan and Milne (1991) by assuming that a Lucas representative agent with power utility had to hold the risk exposure in a symmetric variance gamma process. It was shown in Madan, Carr and Chang (1998) that the resulting risk neutral process was equivalent to evaluating Brownian motion with drift at gamma time. Given the importance of asymmetry or skewness in option pricing, we focus directly on this asymmetric variance gamma process but will refer to it as the variance gamma process. The process is parametrically parsimonious in that only two additional parameters are involved beyond the volatility introduced by Black and Scholes, and these two parameters give us control over skewness and kurtosis, that are precisely the primary concern in modeling and assessing derivative risks.

6.1 The variance gamma process Let Y (t; σ , θ ) be a Brownian motion with drift θ and variance rate σ 2 . If W (t) is a standard Brownian motion, we may write the process Y (t; σ , θ) in terms of W (t) as Y (t; σ , θ) = θt + σ W (t). The variance gamma process is obtained on evaluating the process Y at an independent random time given by a gamma process. For this we define the process G(t; ν) with independent increments, identically distributed over non-overlapping intervals of length h, with the increments, G(t + h; ν) − G(t; ν) = g, having the gamma density p(g, h) =

g h/ν−1 exp(−g/ν) . ν h/ν (h/ν)

The mean of the gamma density is h and the variance is νh. Hence the average random time change in h units of calendar time is h and its variance is proportional to the length of the interval. The gamma density is infinitely divisible with characteristic function h/ν 1 E exp(iug) = 1 − iuν

130

D. B. Madan

and the gamma process is an increasing L´evy process with a one sided L´evy density exp (−x/ν) , for x > 0. νx Both the gamma process and Brownian motion are highly tractable processes about which a lot is known and each process has seen many domains of application. The variance gamma process is the process X (t; σ , ν, θ) defined by k(x) =

X (t; σ , ν, θ ) = Y (G(t; ν); σ , θ) = θ G(t; ν) + σ W (G(t; ν))

(7)

or Brownian motion with drift θ and variance rate σ 2 evaluated at the gamma time G(t; ν). Apart from the variance rate of the Brownian motion σ 2 , the two other parameters are θ and ν. We shall observe that it is θ that generates skewness while kurtosis is primarily controlled by ν. 6.1.1 Characteristic function of the variance gamma process The characteristic function of the variance gamma process is easily evaluated by conditioning on the gamma process first and then employing the characteristic function of the gamma process itself. It has a simple analytic form of a quadratic raised to a negative power. Specifically, $ νt # 1 de f . (8) φ X (t) (u) = E exp (iu X (t)) = 2 1 − iuθν + σ 2ν u 2 The Black–Scholes and Merton model employing Brownian motion is a limiting case of this model since the process converges to Brownian motion with drift as one lets the volatility of the time change ν tend to zero. This may also be observed from the characteristic function on letting t/ν tend to infinity as ν tends to zero and noting that the limit is precisely exp(iuθ t − σ 2 u 2 t/2)t the characteristic function of Brownian motion with drift. We also note that if θ is zero, the characteristic function is real valued and the process is therefore symmetric and there is no skewness, hence validating the claim that skewness is generated by θ = 0. This observation is even clearer once we have constructed the L´evy measure for the VG process. 6.1.2 Moments of the variance gamma process The moments of the VG process are easily obtained by exploiting the structure of the process or by differentiating the characteristic function. It is shown in Madan, Carr and Chang (1998) that E [X (t)] = θ t

4. Purely Discontinuous Asset Price Processes

E (X (t) − E [X (t)])2 = θ 2 ν + σ 2 t E (X (t) − E [X (t)])3 = 2θ 3 ν 2 + 3σ 2 θν t E (X (t) − E [X (t)])4 = 3σ 4 ν + 12σ 2 θ 2 ν 2 + 6θ 4 ν 3 t + 3σ 4 + 6σ 2 θ 2 ν + 3θ 4 ν 2 t 2 .

131

We observe again that skewness is zero if θ = 0. Furthermore, in the case of θ = 0 we have that the fourth central moment divided by the square of the second central moment or the kurtosis is 3(1 + ν). This leads to the interpretation that the parameter ν controls kurtosis and is in fact (for θ = 0) the percentage excess kurtosis over the kurtosis of the normal distribution, which is three. 6.1.3 The variance gamma process as a process of finite variation The variance gamma process is a finite variation process and the two increasing processes whose difference is the variance gamma process are both gamma processes. This is observed by considering two independent gamma processes γ p (t) and γ n (t) with mean rates of µ p , µn and variance rates ν p , ν n respectively for the positive and negative components. The characteristic functions of the two gamma processes are E exp(iuγ k (t)) =

1 1 − iuν k /µk

µ2k t/ν k

for k = p, n.

Supposing that the two gamma processes have the same coefficients of variation and ν k /µ2k = ν for k = p, n, we may write the characteristic function of the difference of the two gamma processes as t/ν 1 . E exp iu(γ p (t) − γ n (t) ) = ν p νn νp νn 2 1 − iu µ − µ + u µ µ p

n

p

n

The result follows on comparing this characteristic function with that of the variance gamma process and defining the mean and variance rates of the two gamma processes to be differenced accordingly. Specifically ) 1 2 2σ 2 θ µp = + , θ + 2) ν 2 2 1 2 2σ θ θ + µn = − , 2 ν 2 ν p = µ2p ν, ν n = µ2n ν.

132

D. B. Madan

6.1.4 The L´evy density for the variance gamma process The L´evy density for the variance gamma process is easily constructed from its representation as the difference of two gamma processes using the well known form for the L´evy density of the gamma process. It follows that the L´evy density of the variance gamma process is µn 1 exp(− ν n |x|) for x < 0 ν |x| k X (x) = µp 1 exp(− ν p x) for x > 0. ν x The basic form of the L´evy density is that of a negative exponential scaled by the reciprocal of the jump size. Just as in the gamma process, the integral of the L´evy density is infinite and the process is therefore a finite variation process with infinite arrival rates of jumps. It is helpful to write the L´evy density in terms of the original parameters of the process and this leads to the expression # $ exp θ x/σ 2 2/ν + θ 2 /σ 2 |x| . exp − (9) k X (x) = ν |x| σ The special case of θ = 0 is a symmetric L´evy measure and hence the absence of skew. Negative values of θ give a fatter left tail and induce negative skewness. We also observe that as ν is increased the rate of exponential decay in the L´evy measure is reduced thus raising the arrival rate of jumps of the larger size. This induces the higher kurtosis related to this parameter. The two additional parameters therefore give direct control of the two moments that data analysis indicates we need to be able to control. 6.1.5 The return density for the variance gamma process The density of X (t; σ , ν, θ) is available in closed form and is derived in Madan, Carr and Chang (1998). This is a closed form, in that it is expressible in terms of the special functions of mathematics, in particular the modified Bessel function of the second kind. Specifically we have that the density of X (t) = x given X (0) = 0, h(x, t; σ , ν, θ) = h(x) is # . 2νt − 14 $ 2 2 exp θ x/σ 2 1 x2 2σ 2 +θ h(x) = K t −1 x2 . √ ν 2 σ2 ν ν t/ν 2πσ (t/ν) 2σ 2 /ν + θ 2 (10) There are three terms in the density, an exponential, a real power and the modified Bessel function. This is useful for maximum likelihood estimation of parameters from time series and it is also useful in providing density plots of results. Later

4. Purely Discontinuous Asset Price Processes

133

we report on closed forms for option prices and this incorporates a closed form for the cumulative distribution function as well, that may be used to determine critical values for extreme points in value at risk calculations.

6.2 The stock price process driven by a VG process We replace Brownian motion in the classical formulation of the geometric Brownian motion model by the VG process and define the risk neutral process for the stock price S(t) by t σ 2ν S(t) = S(0) exp r t + X (t; σ , ν, θ) + ln 1 − θν − (11) ν 2 where r is the constant continuously compounded interest rate. Observe from the characteristic function of the VG process that E exp(X (t)) = φ X (−i) νt 1 = 1 − θ ν − σ 2 ν/2 t σ 2ν = exp − ln 1 − θν − ν 2 and hence the mean rate of return on the stock, under the risk neutral process, is the interest rate by construction. We note further that the limit as ν tends to zero of ν1 ln(1 − θν − σ 2 ν/2) is by L’Hopital’s rule −θ − σ 2 /2 and so for small ν this term is −θ t − σ 2 t/2. Noting that X (t) = θ G(t) + σ W (G(t)) but for small ν, G(t) is essentially t, we get that σ2 )t + W (t) 2 or the familiar geometric Brownian motion model for the log of the stock price. Hence we have a generalization of the Black–Scholes and Merton models for the stock price. The generalization has introduced two new parameters ν, θ that we have observed give us control over skewness and kurtosis in the process. ln S(t) = ln S(0) + (r −

6.2.1 Characteristic function of the log of the stock price The characteristic function of the ln(S(t)) is easily derived from that of X (t), and is useful in deriving option prices by Fourier methods. Specifically we have that de f φ ln(S(t)) (u) = E exp (iu ln(S(t))) t σ 2ν = exp iu ln(S(0)) + r t + ln 1 − θν − φ X (t) (u) (12) ν 2

134

D. B. Madan

where φ X (t) (u) is the characteristic function of the VG process given in (8).

6.3 Variance gamma option pricing When the risk neutral process for the stock is described by the variance gamma process for the log of stock price as in equation (11), European call options on stock of strike K and maturity t have a price, c(S(0); K , t) that is given by evaluating the expected discounted cash flow c(S(0); K , t) = E e−r t max (S(t) − K , 0) . (13) This valuation result is an application of the defining property of a risk neutral probability, that traded asset prices, when discounted by the value of the money market account, are martingales under this probability. The valuation result follows on noting that option prices at maturity equal the promised payoff. The computation of the call price in equation (13) is accomplished in closed form in Madan, Carr and Chang (1998). Other approaches at efficient computation employ Fourier inversion as described in Bakshi and Madan (1998) or improvements thereof as explained in Carr and Madan (1998). We present here a brief summary of these results. The reader is referred to the original papers for further details.4 6.3.1 The Madan, Carr and Chang closed form The method employed by Madan, Carr and Chang (1998) to develop a closed form for the VG option price relies on integrating the Black–Scholes formula applied to a random gamma time, with respect to the gamma density for this time. This approach requires the explicit computation of expressions of the form γ −1 ∞ √ a u exp(−u) N √ +b u %(a, b, γ ) = du, (14) (γ ) u 0 where N (x) is the cumulative distribution function of the standard normal variate. The call option price can be explicitly computed in terms of this % function. Specifically we have that $ # ) ) 1 − c1 ν , (α + s) ,γ c(S(0); K , t) = S(0)% d ν 1 − c1 $ # ) ) 1 − c2 ν ,α ,γ − K exp(−r t)% d ν 1 − c2 4 Matlab programs are available for performing these computations in all the three ways described here.

4. Purely Discontinuous Asset Price Processes

135

where σ s=/ 2 1 + σθ

ν 2

θ α=− / 2 σ 1 + σθ t γ = ν ν(α + s)2 c1 = 2 να 2 c2 = 2 d=

ln

S(0) K

s

+ rt

ν 2

γ 1 − c1 . + ln s 1 − c2

A reduction of the % function (14) to the special functions of mathematics is accomplished in terms of the modified Bessel function of the second kind and the degenerate hypergeometric function of two variables with integral representation (Humbert (1920)) 1 (γ ) "(α, β, γ ; x, y) = u α−1 (1 − u)γ −α−1 (1 − ux)−β euy du. (α)(γ − α) 0 Explicitly we have that cγ + 2 exp (sign(a)c) (1 + u)γ √ %(a, b, γ ) = 2π(γ )γ 1

1+u , − sign(a)c(1 + u)) 2 1 cγ + 2 exp(sign(a)c)(1 + u)1+γ − sign(a) √ 2π (γ )(1 + γ ) 1+u , − sign(a)c(1 + u)) ×K γ − 1 (c)"(1 + γ , 1 − γ , 2 + γ ; 2 2 1 cγ + 2 exp (sign(a)c) (1 + u)γ + sign(a) √ 2π(γ )γ 1+u , − sign(a)c(1 + u) ×K γ − 1 (c)" γ , 1 − γ , 1 + γ ; 2 2 ×K γ + 1 (c)"(γ , 1 − γ , 1 + γ ; 2

where

c = |a| 2 + b2

136

D. B. Madan

b u=√ . 2 + b2 Madan, Carr and Chang (1998) go on to employ this closed form in a detailed study of the empirical properties of VG option pricing, noting in particular the importance of skewness from the risk neutral viewpoint, and the ability of the VG model to flatten the implied volatility smile in option pricing. 6.3.2 Inversion of distribution function transforms (Bakshi and Madan) Bakshi and Madan (1998) show that very generally one may write a call option price in the form c(S(0); K , t) = S(0))1 − K exp(−r t))2 where )1 and )2 are complementary distribution functions obtained on computing the integrals e−iuk φ ln(S(t)) (u − i) 1 ∞ 1 du Re )1 = + 2 π 0 iuφ ln(S(t)) (−i) e−iuk φ ln(S(t)) (u) 1 ∞ 1 du )2 = + Re 2 π 0 iu where k = ln(K ) and φ ln(S(t)) (u) is the characteristic function of the log of the stock price given in this case by (12). Bakshi and Madan (2000) study the general spanning properties of the characteristic functions and their relationship to the spanning properties of options. They also express the general relationships between the two probability elements in option pricing providing a discussion of cases where they are analytically linked in their transforms. 6.3.3 Inversion of the modified call price (Carr and Madan) Carr and Madan (1998) define the Fourier transform of the modified call price by ∞ ψ(v) = eivk+αk c(S(0); ek , t)dk −∞

where k = ln(K ), and the multiplication by exp(αk) for α > 0 dampens the call price for negative values of log strike. They show generally that ψ(v) =

e−r t φ ln(S(t)) (v − (α + 1)i) . α 2 + α − v 2 + i(2α + 1)v

4. Purely Discontinuous Asset Price Processes

137

The call option price may then be obtained on a single Fourier inversion of ψ that may also employ the fast Fourier transform to evaluate exp(−αk) ∞ −ivk e ψ(v)dv. c(S(0); K , t) = π 0 Carr and Madan (1998) also consider other strategies for speeding up the pricing of options using the characteristic function of the log of the stock price, and the methods should be useful for a variety of L´evy processes.

6.4 Results on option pricing performance The variance gamma option pricing model was tested in Madan, Carr and Chang (1998) on data for S&P 500 options for the period January 1992 to September 1994. It was noted there that the skew is significant and the three parameter process effectively eliminates the smile in option prices in the direction of moneyness. The pricing errors are generally between 1 and 3 percent for options on the relatively liquid stocks and indices. The maturities we work with get fairly small and are as low as a couple of days at times, while the range of strikes are quite wide and may be up to 20 to 30% out-of-the-money. Yet on this wide range of strikes and low maturities the model provides adequate fits. Here we provide some illustrations of the results for options on the SPX and Nikkei indices. Figures 1 and 2 provide graphs of the prices of out-of-the-money options on these two indices along with the theoretical price curve as fit by the VG model. For strikes above at-the-money the options are calls while puts are used for the strikes below the spot. The typical V shaped price structure observed in markets is basically consistent with that of the negative exponential in the absolute value of the size of the move, that is the local structure of the VG model. The difficulty for Gaussian based models is precisely the fact that for these models option prices of out-of-the-money options fall off too rapidly, being a negative exponential in the square of the move, compared to market. We observe here that the essential structure of price decay is consistent with the building block of completely monotone L´evy densities, the double negative exponential.

7 Asset allocation in L´evy systems Apart from the successes of L´evy processes in option pricing, and the V G model in particular, these processes are associated with financial markets that are incomplete with respect to dynamic trading in the stock and the money market account. In such economies, with stock prices driven by an infinite arrival finite variation L´evy process, European options are market completing assets and one may study the

138

D. B. Madan

Fig. 1. Out-of-the-money option prices on the SPX index and the price curve as fit by the VG model.

Fig. 2. Out-of-the-money option prices on the Nikkei Index and the price curve fit by the VG model.

4. Purely Discontinuous Asset Price Processes

139

question of the optimal demand for these assets by investors. In contrast, for the traditional economy, where options are redundant assets there is no demand for these assets. With these observations in mind, Carr, Jin and Madan (2000) proceed to reformulate the Merton problem for optimal consumption and investment, except now the asset space is genuinely expanded to include all the European options on the stock of all strikes and maturities as well. They study the problem of optimal derivative investment and solve it in closed form for HARA utility when the statistical and risk neutral price processes are in the VG class of processes. They also show that the shape of the optimal financial derivative product is independent of preferences, time horizons and the mean rate of return on the stock, factors that influence the level of investor demand but not the shape. The latter depends primarily on the comparison between the prices of market moves and the relative frequency of their occurence. Their analysis also suggests that demand would be highest for at-the-money low maturity options in such economies, a fact that is in accord with casual market observations.

7.1 Optimal derivative investment Consider an economy trading a stock with price process S(t) that is a homogeneous L´evy process in the interval [0, ϒ] with a L´evy density k P (x) defined over the real line where x represents the jumps in the log of the stock price. An example is provided by the VG process of equation (11). Also trading in the economy are options on this stock with strikes K > 0 and maturities T < ϒ. The prices of these options are given by the processes c(S(t); K , T ) for t < T where these prices are consistent with the absence of arbitrage and are derived in line with martingale pricing methods using the risk neutral measure that is also a homogeneous L´evy process with L´evy density k Q (x). The subscripts P and Q make the important distinction between the statistical price process and the risk neutral process, with the former assessing the relative frequency of events while the latter assesses their prices. In such an economy we wish to study the question of optimal derivative investment. At first glance, and in analogy with the solution methods adopted in Merton (1971) this is a particularly difficult problem that is not going to be tractable from an analytical perspective. This is because we ask for the optimal positions in a doubly indexed continuum of assets, viz. the options of all strikes K > 0 and maturities T > t in a context in which many of these options (i.e. those with maturities below t) are expiring on us. Furthermore, the analytical pricing of these options is generally a complex exercise reflecting all the difficulties associated with the kinked option payoff.

140

D. B. Madan

For reasons of tractability, we reformulate the problem with the focus on the real uncertainty which is the jump in log price of the stock, x. We view investment, not as a decision on what assets to hold, but in the first instance as a design problem where the investor wishes to design the optimal response of his or her wealth to market moves represented by x. Hence we seek to determine the optimal wealth response function w(x, u) which is the jump in the investor’s log wealth if the market were to jump at time u by the amount x in the log price of the stock. The actual investment in options that delivers this optimal wealth response is a secondary problem that may be solved numerically using the spanning properties of options. The structure and solution of this secondary problem is described in further detail in Carr, Jin and Madan (2000). From the perspective of the optimal design of wealth responses, the optimal derivative investment problem may be formulated as a Markov control problem. Carr, Jin and Madan (2000) consider both the infinite time horizon problem with intermediate consumption and the finite horizon problem with no intermediate consumption. Here we present just the former. We denote by c(t) the path of the flow rate of consumption per unit time and suppose the investor has a preference ordering over consumption paths represented by expected utility evaluated as ! ∞ P exp(−βs)U (c(s))ds (15) u=E 0

where P is the statistical probability measure, β is the pure rate of time preference, and U (c) is the instantaneous utility function. The investor wishes to choose the consumption path c(·) and the wealth response design w(·) with a view to maximizing u. The investor is constrained by his budget constraint that describes the evolution of his wealth. The wealth, W (t), transition equation is the integral equation t t r W (s )ds − c(s)ds (16) W (t) = W (0) + 0 0 t ∞ + W (s ) ew(x,u) − 1 m(ω; d x, ds) − k Q (x)d xds , 0

−∞

and the budget constraint requires that the wealth process be non-negative, W (t) ≥ 0 almost surely. The first two terms of the wealth transition are standard and require no explanation, accounting for interest earnings and the financing of the consumption stream. The final term involves integration with respect to two measures, the first is the integer valued random measure m(ω; d x, ds) that is a Dirac delta measure counting the jumps that occur at various times of various sizes. The second is the pricing L´evy measure k Q (x)d xds. The integration with respect to m accounts for the wealth changes actually experienced by the response design

4. Purely Discontinuous Asset Price Processes

141

w(x, u). The integration with respect to k Q (x)d xds accounts for the cost of this wealth response access that must be paid for through time. The wealth transition equation (16) may be rewritten in a form more directly comparable to Merton’s original equation by writing t t r W (s )ds − c(s)ds (17) W (t) = W (0) + 0 0 t ∞ + W (s ) ew(x,u) − 1 k P (x)d xds − k Q (x)d xds 0 −∞ t ∞ W (s ) ew(x,u) − 1 (m(ω; d x, ds) − k P (x)d xds) + 0

−∞

where we have just added and subtracted the integral of the wealth change with respect to the measure k P (x)d xds. In this formulation the final integral in equation (17) is a martingale under the statistical measure P and matches the term representing the martingale component of stock investment in Merton (1971). The first two terms are the same as in Merton (1971). The third term matches the term that evaluates excess returns from stock investment in Merton (1971). Here excess returns are the expected wealth change less the cost or price of this change whereas in Merton we have µ − r. The investor’s optimal derivative investment problem is to choose c(·), w(·), with a view to maximizing the utility u of equation (15) subject to the budget constraint of equation (16).

7.2 Optimal design of wealth responses Let J (W ) be the optimized expected utility when the initial wealth W (0) = W. It is shown in Carr, Jin and Madan (2000) that the optimal wealth response function for the infinite time horizon problem is homogeneous in time and satisfies the equation k Q (x) JW (W ew(x) ) = . JW (W ) k P (x)

(18)

This condition has an intuitive interpretation when it is rewritten as JW (W ew(x) )k P (x) = JW (W ) k Q (x) which is that the expected marginal utility per initial dollar spent on cash in each state, x, is equalized across states. If this is not the case then w(x) should be altered to move funds from states with a lower marginal utility to states with a higher marginal utility. Alternatively, the marginal rate of transformation in utility

142

D. B. Madan

between two states must equal the marginal rate of transformation in markets between the same two states. The optimal wealth response w(x), is then determined from equation (18), if we know the function J (W ) as k Q (x) −1 w(x) = JW JW (W ) . k P (x) We learn from this representation that the optimal wealth response design is a possibly smooth function JW−1 applied to the ratio of two finite variation, infinite arrival rate L´evy measures. Such L´evy measures are kinked by construction at zero where the arrival rate goes to infinity. It follows that one would expect to see this property inherited by w(x). This has the implication that at a minimum, optimal wealth response design positions investors with different slopes of their desired wealths with respect to up and down market movements, from at-the-money. Equivalently, there is a demand for short maturity at-the-money options. 7.2.1 HARA VG financial products In the special case when the statistical and risk neutral processes are in the VG class and the utility function U (c) is in the HARA (hyperbolic absolute risk aversion) class of utility functions, the optimal derivative investment problem of section 7.1 is shown in Carr, Jin and Madan (2000) to have a closed form solution where J (W ) is also in the HARA class of utility functions. The kinks in optimal designs discussed generally in section 7.2 can now be explicitly computed for this case. Specifically, suppose the statistical L´evy measure is symmetric and given by # ) $ 2 |x| 1 exp − (19) k P (x) = κ |x| κ s where κ is the volatility of the statistical gamma time change for a symmetric Brownian motion with volatility s. Further suppose that the risk neutral L´evy measure is as given by (9) and parameters σ , ν, and θ. Let the utility function be 1−γ α γ c− A . U (c) = 1−γ γ In this case, defining θ σ2 . ) 1 2 θ2 1 2 − + 2 λ= s κ σ ν σ

ζ=

4. Purely Discontinuous Asset Price Processes

143

Fig. 3. Optimal spot slides in the presence of excess risk neutral kurtosis and skew.

and letting R denote the price relative of asset price post jump to its pre jump value, then the optimal product takes the form " f (R) =

R− R

ζ +λ γ

− ζ −λ γ

for R > 1 for R < 1.

(20)

and the kink at-the-money is present unless λ = 0. The shape of this product is independent of the floor of the utility function and depends primarily on the statistical and risk neutral L´evy measures and risk aversion as represented by γ . We also observe the clear impact of risk aversion on optimal product design. As we raise γ , the effect on this on the optimal wealth response f (R) is to flatten out the movement in the optimal wealth response and to let the payoff approach that of a bond, thereby reflecting a lack of tolerance for movements in wealth. A variety of possible shapes can arise for the optimal product and these are illustrated in Figures 3–6 for a variety of settings on the statistical and risk neutral parameters. Each figure reports three curves, for varying levels of risk aversion (RRA) and the flattening out of the response as we raise risk aversion is apparent in each case. Since these graphs draw optimal portfolio values against the level of the spot asset they are referred to as spot slides.

144

D. B. Madan

Fig. 4. Optimal spot slide for a strong skew and a mild excess kurtosis.

In Figure 3 the excess risk neutral kurtosis and skew leads to large moves being priced high relative to their likelihood and hence the optimal spot slide shorts these events and we have an inverted V shape for the spot slide. For Figure 4 the skew is strong and the kurtosis is mild. This leads to falls being overpriced while rises are underpriced. The optimal slide is basically long the asset, but the positioning with respect to rises, the up delta, and falls, the down delta, differ. For Figure 5 we have an excess statistical volatility making large moves relatively cheap securities. This gives rise to the V shaped optimal position. Figure 6 is a reverse of the situation of Figure 4. The direction of the skew has been reversed and leads to a basically short position, with the kink induced by the behavior of the L´evy densities at the origin.

8 Spot slide calibration and position measures The inputs for constructing an optimal spot slide are fairly simple and require just the specification of the statistical or time series moments of the return distribution, from which one may infer κ and s, the statistical L´evy measure parameters. The next step is to obtain data on market option prices, preferably for short

4. Purely Discontinuous Asset Price Processes

145

Fig. 5. Optimal spot slide when statistical volatility dominates risk neutral volatility

Fig. 6. Optimal spot slide for a positive skewness.

146

D. B. Madan

Fig. 7. Optimal spot slide as calibrated to a book of derivatives on an index.

maturity options and then to estimate the risk neutral L´evy measure and the three parameters σ , ν and θ. Finally, making some assumption on the coefficient of relative risk aversion in a power utility function gives us γ and we are ready to graph the optimal spot slide describing how one should currently be positioned in the derivatives markets. For a contrast, one may compare with the actual spot slide that aggregates a trader’s derivatives book and draws the response curve of his book value to market moves. We present here the results of calibrating optimal spot slides to data on actual spot slides. In the calibration we allowed for a reverse engineering of the coefficient of risk aversion γ as there is no other way to estimate this quantity. However, we also observed that the risk neutral excess kurtosis ν is typically an order of magnitude above its statistical counterpart κ and so we allowed this entity to be reverse engineered as well. Such an approach is defensible on noting that the variance of kurtosis estimates are of the order of the eighth moment and as the time series involved are not very long, generally two to four years, there is some leeway in an appropriate choice of this magnitude. The other parameters, σ , ν, θ , and s are taken at their estimated values. For a variety of underlying assets and on a number of days, we reverse engineered the values of γ and κ so as to match the optimal spot slide with the actual spot slide observed for that day. Remarkably, we were able in many cases to come close to actual spot slides by just a simple choice of these two parameters (γ , κ).

4. Purely Discontinuous Asset Price Processes

147

Figure 7 presents an example of an optimal spot slide as calibrated to an actual spot slide on a book of derivatives on a index. The ratio of κ to ν is referred to as β in the graph and describes the relative excess kurtosis of the subjective and risk neutral densities. Though it is often fairly small when calibrated, it is often an order of magnitude above the ratio of the statistical excess kurtosis to the risk neutral excess kurtosis. Once all these parameters have been estimated and importantly γ and κ have been inferred from data on the actual spot slide, one may infer a personalized risk neutral density given by the subjective L´evy measure, determined by the parameters s and κ as described by equation (19), that is transformed by the marginal utility process as described in Madan and Milne (1991) to obtain the personalized risk neutral L´evy measure, k I (x) (the subscript I being indicative of an individualized measure) $ # ) 2 |x| 1 . (21) k I (x) = exp (−γ x) exp − κ |x| κ s The L´evy measure (21) is that of a VG process with personalized values for σ I , ν I , θ I given by s κν σI = / 2 2 1 − γ 2s κ θI

= −γ

νI

= κ.

s2 κ ν 1 − γ 2s2κ 2

(22)

We thus infer a personalized risk neutral process and this may be employed to construct a personalized return density that we term a position measure, as it is reverse engineered from derivative positions being viewed as optimal and therefore reflects preferences and beliefs that are obtained by a revealed preference exercise. All three densities are in the VG class of processes. On completing this reverse engineering task we have available a statistical return density estimated from the time series of the return data, a risk neutral density as inferred from options data, and a position density as reverse engineered from the actual spot slide of the derivatives book. Figures 8, 9, 10 and 11 present a range of samples of graphs of these densities on a variety of underlying assets. We observe a fairly diverse set of shapes of the densities, with varying degrees of skewness and kurtosis as reflected in the size of tails on the left and the right of the distribution. Furthermore, generally the position density is closer to the statistical density than the risk neutral density, reflecting the view that traders

148

D. B. Madan

Fig. 8. Statistical, risk neutral and position densities for the SPX.

Fig. 9. Statistical, risk neutral and position densities for RUT.

4. Purely Discontinuous Asset Price Processes

Fig. 10. Statistical, risk neutral and position densities for the MSH.

Fig. 11. Statistical, risk neutral and position densities for the DRG.

149

150

D. B. Madan

respect probability calculation as inferred from time series, and position themselves accordingly given the market prices of market moves as reflected in the risk neutral distribution. Occasionally, however, as in the case of Figure 9 the position density may be skewed further to the left than even the risk neutral density and is reflective of greater risk aversion on the part of the trader than is prevalent in the market.

9 Conclusion We argue here that empirical evidence on the statistical and risk neutral price processes for financial assets belong to the class of purely discontinuous processes of finite variation, albeit ones of high activity, as reflected by an infinite arrival rate of jumps. Structurally, the pattern of jump arrival rates is consistent with the hypothesis of complete monotonicity whereby arrival rates at smaller size levels are higher. Economic considerations of the absence of arbitrage point in the same direction by demonstrating that semimartingales, the candidate no arbitrage price process, is a time changed Brownian motion and the increasing random process of the time change is of necessity purely discontinuous, if it is not locally deterministic. The attribute of finite variation is attractive from two perspectives, one that allows a separation of the up and down tick modeling of the market, and we offer two representations of such price processes that are related under complete monotonicity of the L´evy density. The second attractive feature of finite variation is its robustness as reflected in its tolerance of parametric heterogeneity without the resulting measures being singular or disjoint in their sets of almost sure outcomes. This lack of robustness is an inherent property of infinite variation processes and we strongly advocate against the use of these processes as models for the price process unless there is overwhelming evidence in support of such a choice. The class of stationary processes of independent and identically distributed increments meeting our requirements are characterized as a subclass of L´evy processes. Within this class, an important and analytically rich example is provided by Brownian motion time changed by a gamma process that combines in an interesting way two well studied processes in their own right. We summarize the properties of the resulting process termed the variance gamma process. The process has two additional parameters that enable it combat skew and kurtosis. Option pricing under the variance gamma process is tractable using a variety of methods and we outline three such methods. The first is a closed form in terms of the modified Bessel function of the second kind and the degenerate hypergeometric function of two variables. The second involves two Fourier inversions for the complementary distribution function and the third employs direct Fourier inversion for the call price using the fast Fourier transform. The results of estimations are

4. Purely Discontinuous Asset Price Processes

151

illustrated for data on SPX and Nikkei Index options. It is observed that the model eliminates the smile in the strike direction, using effectively for this purpose its two additional parameters. Infinite arrival rate, finite variation, L´evy processes with completely monotone L´evy densities are processes for the stock price for which options are market completing assets that are part of the primary assets of the economy with a genuine demand for these assets by investors. We study the Merton problem of optimal consumption and investment with the asset space expanded to include out-of-the-money European options as investment vehicles. For HARA utility and VG statistical and risk neutral processes this problem is solved in closed form with optimal portfolios that are kinked at-the-money and display a different slope with respect to upward and downward movements of the market. The positions reflect a role for at-the-money short maturity options, the most liquid end of the options market in practice. Using our theory of optimal derivative positioning we illustrate how one may reverse engineer the preferences and beliefs of traders from observed spot slides of the derivatives book. This allows us to infer personalized risk neutral densities from observations on positions and we term this density the position density. Illustrations are provided, for comparative purposes of the statistical, risk neutral and position densities. It is observed that position densities are generally closer to the statistical density and lie between the statistical and risk neutral densities. At times however, they may be more skewed than the risk neutral density reflecting risk aversion that dominates market risk aversion.

Acknowledgment I would like to thank all my co-authors for all the hard work on the various aspects of this project. They are in approximate chronological order, Eugene Seneta, Frank Milne, Eric Chang, Peter Carr, Helyette Geman, Marc Yor and Gurdip Bakshi. The support and encouragement offered by Claudia Albanese, Marco Avellanada, Joseph Cherian, Carl Chiarella, Jaksa Cvitani´c, Nicole El Karoui, Hans F¨ollmer, Robert Jarrow, Yuri Kabanov, Ioannis Karatzas, Vadim Linetsky, Vincent Lacoste, Eckhardt Platen, Marc Pinsky, Stan Pliska, Phillip Protter, Raymond Rishel, Martin Schweizer, Steve Shreve, Met´e Soner, and Thaleia Zariphopoulou is also greatly appreciated. Finally I would like to acknowledge the assistance and guidance I have received from my co-workers at Morgan Stanley Dean Witter, they are Doug Bonard, Steven Chung, Georges Courtadon, Peter Fraenkel, Santiago Garcia, George George, Kevin Holley, Ajay Khanna, Harry Mendell, and Lisa Polsky. Any remaining errors are solely my responsibility.

152

D. B. Madan

References Bakshi, G. and Chen, Z. (1997), An alternative valuation model for contingent claims, Journal of Financial Economics 44, 123–65. Bakshi, G. and Madan, D.B. (2000), What is the probability of a stock market crash, Working Paper, University of Maryland. Bakshi, G. and Madan, D.B. (1998), Spanning and derivative security valuation, Journal of Financial Economics 55, 205–38. Bates, D. (1996), Jumps and stochastic volatility: exchange rate processes implicit in Deutschmark options, The Review of Financial Studies 9, 69–108. Bertoin, J. (1996), L´evy Processes, Cambridge University Press, Cambridge. Breeden, D. and Litzenberger, R. (1978), Prices of state contingent claims implicit in option prices, Journal of Business 51, 621–51. Black, F. and Scholes, M. (1973), The pricing of options and corporate liabilities, Journal of Political Economy 81, 637–54. Carr, P., Geman, H., Madan, D.B and Yor, M. (2000), The fine structure of asset returns: an empirical investigation, forthcoming in the Journal of Business. Carr, P., Jin, X. and Madan, D.B. (2000), Optimal investment in derivative securities, forthcoming in Finance and Stochastics. Carr, P. and Madan, D.B. (1999), Option valuation using the fast Fourier transform, Journal of Computational Finance 4, 61–73. Cox, J.C., Ingersoll, J.E. and Ross, S.A. (1985), A theory of the term structure of interest rates, Econometrica 53, 385–408. Cox, J. and Ross, S.A. (1976), The valuation of options for alternative stochastic processes, Journal of Financial Economics 3, 145–66. Das, S. and Foresi, S. (1996), Exact solutions for bond and options prices with systematic jump risk, Review of Derivatives Research 1, 7–24. Delbaen, F. and Schachermayer, W. (1994), A general version of the fundamental theorem of asset pricing, Mathematische Annalen 300, 520–63. Derman, E. and Kani, I. (1994), Riding on a smile, Risk 7, 32–9. Dupire, B. (1994), Pricing with a smile, Risk 7, 18–20. Embrechts, P. Kluppelberg, C. and Mikosch, T. (1997), Modeling Extremal Events, Springer-Verlag, Berlin. Fama, E.F. (1965), The behavior of stock market prices, Journal of Business 38, 34–105. Feller, W.E. (1971), An Introduction to Probability Theory and its Applications, 2nd edition, Wiley, New York. Geman, H., Madan, D.B. and Yor, M. (2000), Time changes for L´evy processes, forthcoming in Mathematical Finance. Harrison, J.M. and Kreps, D. (1979), Martingales and arbitrage in multiperiod securities markets, Journal of Economic Theory 20, 381–408. Harrison, J.M. and Pliska, S.R. (1981), Martingales and stochastic integrals in the theory of continuous trading, Stochastic Processes and Their Applications 11, 215–60. Heston, S.L. (1993), A closed-form solution for options with stochastic volatility with applications to bond and currency options, The Review of Financial Studies 6, 327–43. Hull, J. and White, A. (1987), The pricing of options on assets with stochastic volatility, Journal of Finance 42, 281–300. Humbert, P. (1920), The confluent hypergeometric functions of two variables, Proceedings of the Royal Society of Edinburgh 73–85. Jacod, J. and Shiryaev, A. (1998), Local martingales and the fundamental asset pricing theorems in the discrete-time case, Finance and Stochastics 3, 259–73.

4. Purely Discontinuous Asset Price Processes

153

Jacod, J. and Shiryaev, A. (1980), Limit Theorems for Stochastic Processes, Springer-Verlag, Berlin. Jarrow, R.A. and Madan, D. (2000), Martingales and private monetary values, forthcoming in Journal of Risk. Kreps, D. (1981), Arbitrage and equilibrium in economies with infinitely many commodities, Journal of Mathematical Economics 8, 15–35. Madan, D.B., Carr, P. and Chang, E. (1998), The variance gamma process and option pricing, European Finance Review 2, 79–105. Madan D.B. and Milne, F. (1991), Option pricing with VG martingale components, Mathematical Finance 1, 39–55. Madan, D.B. and Seneta, E. (1989), Characteristic function estimation using maximum likelihood on transformed variables, Journal of the Royal Statistical Society ser. B, 51, 281–5. Madan, D.B. and Seneta, E. (1990), The variance gamma (V.G.) model for share market returns, Journal of Business 63, 511–24. Merton, R.C. (1971), Optimum consumption and portfolio rules in a continuous time model, Journal of Economic Theory 3, 373–413. Merton, R.C. (1973), Theory of rational option pricing, Bell Journal of Economics and Management Science 4, 141–83. Merton, R.C. (1976), Option pricing when underlying stock returns are discontinuous, Journal of Financial Economics 3, 125–44. Monroe, I. (1978), Processes that can be embedded in Brownian motion, The Annals of Probability 6, 42–56. Naik, V. and Lee, M. (1990), General equilibrium pricing of options on the market portfolio with discontinuous returns, The Review of Financial Studies 3, 493–522. Press, J.S. (1967), A compound events model for security prices, Journal of Business 40, 317–35. Revuz, D. and Yor, M. (1994), Continuous Martingales and Brownian Motion, Springer-Verlag, Berlin. Rogers, C. (1997), Arbitrage with fractional Brownian motion, Mathematical Finance 7, 95–105 Ross, S.A. (1976a), Options and efficiency, Quarterly Journal of Economics 90, 75–89. Ross, S.A. (1976b), Arbitrage theory of capital asset pricing, Journal of Economic Theory 13, 341–60.

5 Latent Variable Models for Stochastic Discount Factors ´ Renault Ren´e Garcia and Eric

1 Introduction Latent variable models in finance have traditionally been used in asset pricing theory and in time series analysis. In asset pricing models, a factor structure is imposed on a collection of asset returns to describe their joint distribution at a point in time, while in time series, the dynamic behavior of a series of multivariate returns depends on common factors for which a time series process is assumed. In both cases, the fundamental role of factors is to reduce the number of correlations between a large set of variables. In the first case, the dimension reduction is cross-sectional, in the second longitudinal. Factor analysis postulates that there exists a number of unobserved common factors or latent variables which explain observed correlations. To reduce dimension, a conditional independence is assumed between the observed variables given the common factors. Arbitrage pricing theory (APT) is the standard financial model where returns of an infinite sequence of risky assets with a positive definite variance–covariance matrix are assumed to depend linearly on a set of common factors and on idiosyncratic residuals. Statistically, the returns are mutually independent given the factors. Economically, the idiosyncratic risk can be diversified away to arrive at an approximate linear beta pricing: the expected return of a risky asset in excess of a risk-free asset is equal to the scalar product of the vector of asset risks, as measured by the factor betas, with the corresponding vector of prices for the risk factors. The latent GARCH factor model of Diebold and Nerlove (1989) best illustrates the type of time series model used to characterize the dynamic behavior of a set of financial returns. All returns are assumed to depend on a common latent factor and on noise. A longitudinal dimension reduction is achieved by assuming that the factor captures and subsumes the dynamic behavior of returns.1 The imposed 1 A cross-sectional dimension reduction is also achieved if the variance–covariance matrix of residuals is

assumed to be diagonal.

154

5. Latent Variable Models for SDFs

155

statistical structure is a conditional absence of correlation between the factor and the noise terms, given the whole past of the factor and the noise, while the conditional variance of the factor follows a GARCH structure. This autoregressive conditional variance structure is important for financial applications such as portfolio allocations or value-at-risk calculations. In this chapter, we aim at providing a unifying analysis of these two strands of literature through the concept of stochastic discount factor (SDF). The SDF (m t+1 ), also called pricing kernel, discounts future payoffs pt+1 to determine the current price π t of assets: π t = E[m t+1 pt+1 |Jt ],

(1.1)

conditionally to the information set at time t, Jt . We summarize in Section 2 the mathematics of the SDF in a conditional setting according to Hansen and Richard (1987). Practical implementation of an asset pricing formula like (1.1) requires a statistical model to characterize the joint probability distribution of (m t+1 , pt+1 ) given Jt . We specify in Section 3 a dynamic statistical framework to condition the discounted payoffs on a vector of state variables. Assumptions are made on the joint probability distribution of the SDF, asset payoffs and state variables to provide a state-space modeling framework which extends standard models. Beta pricing relations amount to characterizing a vector space basis for the SDF through a limited number of factors. The coefficients of the SDF with respect to the factors are specified as deterministic functions of the state variables. Factor analysis and beta pricing with conditioning on state variables are reviewed in Section 4. In dynamic asset pricing models, one can distinguish between reduced-form time-series models such as conditionally heteroskedastic factor models and asset pricing models based on equilibrium. We propose in Section 5 an intertemporal asset pricing model based on a conditioning on state variables which includes as a particular case stochastic volatility models. In this respect, we stress the importance of timing in conditioning to generate instantaneous correlation effects called leverage effects and show how it affects the pricing of stocks, bonds and European options. We make precise how this general model with latent variables relates to standard models such as CAPM for stocks and Black and Scholes (1973) or Hull and White (1987) for options.

2 Stochastic discount factors and conditioning information Since Harrison and Kreps (1979) and Chamberlain and Rothschild (1983), it is well-known that, when asset markets are frictionless, portfolio prices can be characterized as a linear valuation functional that assigns prices to the portfolio payoffs.

156

´ Renault R. Garcia and E.

Hansen and Richard (1987) analyze asset pricing functions in the presence of conditioning information. Their main contribution is to show that these pricing functions can be represented using random variables included in the collection of payoffs from portfolios. In this section we summarize the mathematics of a stochastic discount factor in a conditional setting following Hansen and Richard (1987). We focus on one-period securities as in their original analysis. In the next section, we will provide an extended framework with state variables to accommodate multiperiod securities. We start with a probability space (, A, P). We denote the conditioning information as the information available to economic agents at date t by Jt , a sub-sigma algebra of A. Agents form portfolios of assets based on this information, which includes in particular the prices of these assets. A one-period security purchased at time t has a payoff p at time (t + 1). For such securities, an asset pricing model π t (·) defines for the elements p of a set Pt+1 ⊂ Jt+1 of payoffs a price π t ( p) ∈ Jt . The payoff space includes the payoffs of primitive assets, but investors can also create new payoffs by forming portfolios. Assumption 2.1 (Portfolio formation) p1 , p2 ∈ Pt+1 5⇒ w1 p1 + w2 p2 ∈ Pt+1 for any variables w1 , w2 ∈ Jt . Since we always maintain a finite-variance assumption for asset payoffs, Pt+1 is, by virtue of Assumption 2.1, a pre-Hilbertian vectorial space included in: + Pt+1 = { p ∈ Jt+1 ; E[ p 2 |Jt ] < +∞}

which is endowed with the conditional scalar product: . p1 , p2 / Jt = E[ p1 p2 |Jt ].

(2.1)

The pricing functional π t (·) is assumed to be linear on the vectorial space Pt+1 of payoffs; this is basically the standard “law of one price” assumption, that is a very weak version of a condition of no-arbitrage. Assumption 2.2 (Law of one price) For any p1 and p2 in Pt+1 and any w1 , w2 ∈ Jt : π (w1 p1 + w2 p2 ) = w1 π ( p1 ) + w2 π( p2 ). The Hilbertian structure (2.1) will be used for orthogonal projections on the set Pt+1 of admissible payoffs both in the proof of Theorem 2.3 below (a conditional version of the Riesz representation theorem) and in Section 4. Of course, this implies that we maintain an assumption of closedness for Pt+1 . Indeed, Assumption 2.2 can be extended to an infinite series of payoffs to ensure not only a property of

5. Latent Variable Models for SDFs

157

closedness for Pt+1 but also a continuity property for π t (·) on Pt+1 with appropriate notions of convergence for both prices and payoffs. With these assumptions and a technical condition ensuring the existence of a payoff with nonzero price to rule out trivial pricing functions, one can state the fundamental theorem of Hansen and Richard (1987), which is a conditional extension of the Riesz representation theorem. Theorem 2.3 There exists a unique payoff p∗ in Pt+1 that satisfies: (i) π t ( p) = E[ p ∗ p|Jt ] for all p in Pt+1 ; (ii) P[E[ p ∗2 |Jt ] > 0] = 1. In other words, the particular payoff which is used to characterize any asset price is almost surely nonzero. With an additional no-arbitrage condition, it can be shown to be almost surely positive. 3 Conditioning the discounted payoffs on state variables We just stated that, given the law of one price, a pricing function π t (·) for a conditional linear space Pt+1 of payoffs can be represented by a particular payoff p∗ such that condition (i) of Theorem 2.3 is fulfilled. In this section, we do not focus on the interpretation of the stochastic discount factor as a particular payoff. Instead, we consider a time series (m t+1 )t≥1 of admissible SDFs or pricing kernels, which means that, at each date t, m t+1 belongs to the set Mt+1 defined as: + ; π t ( pt+1 ) = E t [m t+1 pt+1 |Jt ], ∀ pt+1 ∈ Pt+1 }. Mt+1 = {m t+1 ∈ Pt+1

(3.1)

For a given asset, we will write the asset pricing formula as: π t = E[m t+1 pt+1 |Jt ].

(3.2)

For the implementation of such a pricing formula, we need to model the joint probability distribution of (m t+1 , pt+1 ) given Jt . To do this, we will stress the usefulness of factors and state variables. We will suppose without loss of generality2 that the future payoff is the future price of the asset itself π t+1 . The problem is therefore to find the pricing function ϕ t (Jt ) such that: ψ t (Jt ) = E[m t+1 ψ t (Jt+1 )|Jt ].

(3.3)

Both factors and state variables are useful to reduce the dimension of the problem to be solved in (3.3). To see this, one can decompose the information Jt into three types of variables. First, one can include asset-specific variables denoted Yt , which 2 As usual, if there are dividends or other cashflows, they may be included in the price by a convenient

discounted sum. We will abandon this convenient expositional shortcut when we refer to more specific assets in subsequent sections.

158

´ Renault R. Garcia and E.

should contain at least the price π t . Dividends as well as other variables which may help characterize m t+1 could be included without really complicating matters. Second, the information will contain a vectorial process Ft of factors. Such factors could be suggested by economic theory or chosen purely on statistical grounds. For example, in equilibrium models, a factor could be the consumption growth process. In factor models, they could be observable macroeconomic indicators or latent factors to be extracted from a universe of asset returns. In both cases these variables are viewed as explanatory factors, possibly latent, of the collection of asset prices at time t. The purpose of these factors is to reduce the cross-sectional dimension of the collection of assets. Third, it is worthwhile to introduce a vectorial process Ut of exogenous state variables in order to achieve a longitudinal reduction of dimension. Two assumptions are made about the conditional probability distribution of (Yt , Ft )1≤t≤T knowing U1T = (Ut )1≤t≤T (for any T -tuplet t = 1, . . . , T of dates of interest) to support the claim that the processes making up Ut summarize the dynamics of the processes (Yt , Ft ). First we assume that the state variables subsume all temporal links between the variables of interest. Assumption 3.1 The pairs (Yt , Ft )1≤t≤T , t = 1, . . . , T are mutually independent knowing U1T = (Ut )1≤t≤T . According to the standard latent factor analysis terminology, Assumption 3.1. means that the TH variables Ut ∈ R H , t = 1, . . . , T provide a complete system of factors to account for the relationships between the variables (Yt , Ft )1≤t≤T (see for example Bartholomew (1987), p. 5). In the original latent variable modeling of Burt (1941) and Spearman (1927) in the early part of the century to study human intelligence, Yt represented an individual’s score to the test number t of mental ability. The basic idea was that individual scores at various tests will become independent (with repeated observations on several human subjects) given a latent factor called general intelligence. In our modeling, t denotes a date. When, with only one observation of the path of (Yt , Ft ), t = 1, . . . , T , we assume that these variables become independent given some latent state variables, it is clear that we also have in mind a standard temporal structure which provides an empirical content to this assumption. A minimal structure to impose is the natural assumption that only past and present values Uτ , τ = 1, 2, . . . , t of the state variables matter for characterizing the probability distribution of (Yt , Ft ). Assumption 3.2 The conditional probability distribution of (Yt , Ft ) given U1T = (Ut )1≤t≤T coincides, for any t = 1, . . . , T , with the conditional probability distribution given U1t = (Uτ )1≤τ ≤t .

5. Latent Variable Models for SDFs

159

Assumption 3.2. is the following conditional independence3 property assumption: T )|(U1t ) (Yt , Ft )6(Ut+1

(3.4)

for any t = 1, . . . , T . Property (3.4) coincides with the definition of noncausality by Sims (1972) insofar as Assumption 3.1. is maintained and means that (Y, F) do not cause U in the sense of Sims.4 If we are ready to assume that the joint probability distribution of all the variables of interest is defined by a density function ,, Assumptions 3.1 and 3.2 are summarized by: ,[(Yt , Ft )1≤t≤T |U1T ] =

T 0

,[(Yt , Ft )|U1t ].

(3.5)

t=1

The framework defined by (3.5) is very general for state-space modeling and extends such standard models as parameter driven models described in Cox (1981), stochastic volatility models as well as the state-space time series models (see Harvey (1989)). Our vector Ut of state variables can also be seen as a hidden Markov chain, a popular tool in nonlinear econometrics to model regime switches introduced by Hamilton (1989). The merit of Assumptions 3.1 and 3.2 for asset pricing is to summarize the relevant conditioning information by the set U1t of current and past values of the state variables, ,[(Yt+1 , Ft+1 , Ut+1 )|(Yτ , Fτ )1≤τ ≤t U1t ] = ,[(Yt+1 , Ft+1 , Ut+1 )|U1t ].

(3.6)

In practice, to make (3.6) useful, one would like to limit the relevant past by a homogeneous Markovianity assumption. Assumption 3.3 The conditional probability distribution of (Yt+1 , Ft+1 , Ut+1 ) given U1t coincides, for any t = 1, . . . , T , with the conditional probability distribution given Ut . Moreover, this probability distribution does not depend on t. This assumption implies that the multivariate process Ut is homogeneous Markovian of order one.5 3 See Florens, Mouchart and Rollin (1990) for a systematic study of the concept of conditional independence

and Florens and Mouchart (1982) for its relation with noncausality. 4 This noncausality concept is equivalent to the noncausality notion developed by Granger (1969). Assumption

3.2 can be equivalently replaced by an assumption stating that the state variables U can be optimally forecasted from their own past, with the knowledge of past values of other variables being useless (see Renault (1999)). 5 As usual, since the dimension of the multivariate process U is not limited a priori, the assumption of t Markovianity of order one is not restrictive with respect to higher order Markov processes. For brevity, we will hereafter term Assumption 3.3 the assumption of Markovianity of the process Ut .

´ Renault R. Garcia and E.

160

Given these assumptions, we are allowed to conclude that the pricing function, as characterized by (3.3), will involve the conditioning information only through the current value Ut of the state variables. Indeed, (3.6) can be rewritten: ,[(Yt+1 , Ft+1 , Ut+1 )|(Yτ , Fτ )1≤τ ≤t U1t ] = ,[(Yt+1 , Ft+1 , Ut+1 )|Ut ].

(3.7)

We have seen how the dimension reduction is achieved in the longitudinal direction. To arrive at a similar reduction in the cross-sectional direction, one needs to add an assumption about the dimension of the range of m t+1 , given the state variables Ut . We assume that this range is spanned by K factors, Fkt+1, k = 1, . . . , K given as components of the process Ft+1 . Assumption 3.4 (SDF spanning) m t+1 is a deterministic function of the variables Ut and Ft+1 . This assumption is not as restrictive as it might appear since it can be maintained when there exists an admissible SDF m t+1 with an unsystematic part εt+1 = m t+1 − E[m t+1 |Ft+1 , Ut ] that is uncorrelated, given Ut , with any feasible payoff pt+1 ∈ t+1 = E[m t+1 |Ft+1 , Ut ] is another admissible SDF Pt+1 . Actually, in this case, m t+1 is by since E[m t+1 pt+1 |Ut ] = E[ m t+1 pt+1 |Ut ] for any pt+1 ∈ Pt+1 and m definition conformable to Assumption 3.4. In Section 4 below, we will consider a linear SDF spanning, even if Assumption 3.4 allows for more general factor structures such as log-linear factor models of interest rates in Duffie and Kan (1996) and Dai and Singleton (1999) or nonlinear APT (see Bansal et al., 1993). The linear benchmark is of interest when, for statistical or economic reasons, it appears useful to characterize the SDF as an element of a particular K -dimensional vector space, possibly time-varying through state variables. This is in contrast with nonlinear factor pricing where structural assumptions make a linear representation irrelevant for structural interpretations, even though it would remain mathematically correct.6 The linear case is of course relevant when the asset pricing model is based on a linear factor model for asset returns as in Ross (1976) as we will see in the next section. 4 Affine regression of payoffs on factors with conditioning on state variables The longitudinal reduction of dimension through state variables put forward in Section 3 will be used jointly with the cross-sectional reduction of dimension through factors in the context of a conditional affine regression of payoffs or returns on factors. More precisely, the factor loadings, which are the regression coefficients on factors and which are often called beta coefficients, will be considered from 6 We will see in particular in Section 5 that a log-linear setting appears justified by a natural log-normal model

of returns given state variables.

5. Latent Variable Models for SDFs

161

a conditional viewpoint, where the conditioning information set will be summarized by state variables given (3.7). We will first introduce the conditional beta coefficients and the corresponding conditional beta pricing formulas. We will then revisit the standard asset pricing theory which underpins these conditional beta pricing formulas, namely the arbitrage pricing theory of Ross (1976) stated in a conditional factor analysis setting.

4.1 Conditional beta coefficients We first introduce conditional beta coefficients for payoffs, then for returns. Definition 4.1 The conditional affine regression E L t [Pt+1 |Ft+1 ] of a payoff pt+1 on the vector Ft+1 of factors given the information Jt is defined by: E L t [ pt+1 |Ft+1 ] = β 0t +

K

β kt Fkt+1

(4.1)

k=1

with: εt+1 = pt+1 − E L t [ pt+1 |Ft+1 ] satisfying: E[εt+1 |Jt ] = 0, Cov[ε t+1 , Ft+1 |Jt ] = 0. Similarly, if we denote by rt+1 = pt+1 /π t ( pt+1 ) the return of an asset with a payoff7 pt+1 , we define the conditional affine regression of the return rt+1 on Ft+1 by: K β rkt Fkt+1 . (4.2) E L t [rt+1 |Ft+1 ] = β r0t + k=1

Of course, the beta coefficients of returns can be related to the beta coefficients of payoffs by: β kt β rkt = for k = 0, 1, 2, . . . , K . (4.3) π t ( pt+1 ) Moreover, the characterization of conditional probability distributions in terms of returns instead of payoffs makes more explicit the role of state variables. To see this, let us describe payoffs at time t + 1 from the price at the same date and a dividend process by:8 pt+1 = π t+1 + Dt+1 .

(4.4)

7 Strictly speaking, the return is not defined for states of nature where π ( p t t+1 ) = 0. This may complicate

the statement of characterization of the SDF in terms of expected returns as in the main theorem (Theorem 4.4) of this section. However, this technical difficulty may be solved by considering portfolios which contain a particular asset with nonzero price in any state of nature. This technical condition ensuring the existence of such a payoff with nonzero price has already been mentioned in Section 2 (see also the sufficient condition 4.11 below when there exists a riskless asset). In what follows, the corresponding technicalities will be neglected. 8 As announced in Section 3, we depart from the expositional shortcut where the price included discounted dividends.

´ Renault R. Garcia and E.

162

Following Assumption 3.1, we will assume that the rates of growth of dividends9 are asset-specific variables Yt and serially uncorrelated given state variables. In t other words, Yt = DDt−1 , t = 1, 2, . . . , T , are mutually independent given U1T . Moreover, π t+1 in (4.4) has to be interpreted as the price at time (t + 1) of the same asset with price π t at time t defined from the pricing functional (3.3). In other words, the pricing equation (3.3) can be rewritten: ! Dt+1 ψ t (Jt+1 ) ψ t (Jt ) = E m t+1 + 1 |Jt . (4.5) Dt Dt Dt+1 Given Assumptions 3.1, 3.2 and 3.3, we are allowed to conclude that, under general regularity conditions,10 Equation (4.5) defines a unique time-invariant deterministic function ϕ(·) such that: ! Dt+1 ϕ(Ut ) = E m t+1 (ϕ(Ut+1 ) + 1)|Ut . (4.6) Dt In other words, we get the following decomposition formulas for prices and returns: πt rt+1

= ϕ(Ut )Dt π t+1 + Dt+1 Dt+1 ϕ(Ut+1 ) + 1 . = = πt Dt ϕ(Ut )

(4.7)

A by-product of this decomposition is that, by application of (3.7), the joint conditional probability distribution of future factors and returns (Fτ , rτ )τ >t given Jt depends upon Jt only through Ut in a homogeneous way. In particular, the conditional beta coefficients of returns are fixed deterministic functions of the current value of state variables: β rkt = β rk (Ut )

for k = 0, 1, 2, . . . , K .

(4.8)

4.2 Conditional beta pricing Since the seminal papers of Sharpe (1964) and Lintner(1965) on the unconditional CAPM to the most recent literature on conditional beta pricing (see e.g. Harvey (1991), Ferson and Korajczyk (1995)), beta coefficients with respect to well-chosen factors are put forward as convenient measures of compensated risk which explain the discrepancy between expected returns among a collection of financial assets. In order to document these traditional approaches in the modern setting of SDF, we have to add two fairly innocuous additional assumptions. 9 Stationarity (see Assumption 3.3) requires that we include the growth rates of dividends and not their levels

in the variables Yt . 10 These regularity conditions amount to the possibility of applying a contraction mapping argument to ensure the existence and unicity of a fixed point ϕ(·) of the functional defining the right hand side of (4.6).

5. Latent Variable Models for SDFs

163

Assumption 4.2 If p Ft+1 denotes the orthogonal projection (for the conditional scalar product (2.1)) of the constant vector ι on the space Pt+1 of feasible payoffs, the set Mt+1 of admissible SDF does not contain a variable λt p Ft+1 with λt ∈ Jt . Assumption 4.3 Any admissible SDF has a nonzero conditional expectation given Jt . Without Assumption 4.2, one could write for any pt+1 ∈ Pt+1 : π t ( pt+1 ) = λt E[ p Ft+1 pt+1 |Jt ] = λt E[ pt+1 |Jt ].

(4.9)

Therefore, all the feasible expected returns would coincide with 1/λt . When there is a riskless asset, Assumption 4.2 simply means that an admissible SDF m t+1 should be genuinely stochastic at time t, that is not an element of the available information Jt at time t. Without Assumption 4.3, one could write the price π t ( pt+1 ) as: π t ( pt+1 ) = E[m t+1 pt+1 |Jt ] = Cov[m t+1 pt+1 |Jt ],

(4.10)

which would not depend on the expected payoff E[ pt+1 |Jt ]. When there is a riskless asset, Assumption 4.3 would be implied by a positivity requirement:11 P[ p > 0] = 1 5⇒ P[π t ( p) ≤ 0] = 0.

(4.11)

With these two assumptions, we can state the central theorem of this section, which links linear SDF spanning with linear beta pricing and multibeta models of expected returns. Theorem 4.4 The three following properties are equivalent: P1: Linear Beta Pricing: ∃ m t+1 ∈ Mt+1 , ∀ pt+1 ∈ Pt+1 : π t ( pt+1 ) = β 0t E[m t+1 |Ut ] +

K

β kt E[m t+1 Fkt+1 |Ut ],

(4.12)

k=1

P2: Linear SDF Spanning: ∃ m t+1 ∈ Mt+1 , ∃ λkt ∈ Jt , k = 0, 1, 2, . . . , K : λkt = λk (Ut )

and m t+1 = λ0 (Ut ) +

K

λk (Ut )Fkt+1 ,

(4.13)

k=1

P3: Multibeta Model of Expected Returns: ∃ ν kt ∈ Jt , k = 0, 1, 2, . . . , K , for any feasible return r t+1 : E[rt+1 |Ut ] = ν 0t +

K

ν kt β rk (Ut ).

(4.14)

k=1 11 This positivity requirement implies the continuity of the pricing function π (·) needed for establishing Theot

rem 2.3.

´ Renault R. Garcia and E.

164

Theorem 4.4 can be proved (see Renault, 1999) from three sets of assumptions: assumptions which ensure the existence of admissible SDFs (Section 2), assumptions about the state variables (Section 3), and technical Assumptions 4.2 and 4.3. Three main lessons can be drawn from Theorem 4.4: (i) It makes explicit what we have called a cross-sectional reduction of dimension through factors, generally conceived to ensure SDF spanning, and more precisely linear SDF spanning, which corresponds to the specification (4.13) of the deterministic function referred to in Assumption 3.4. With a linear beta pricing formula, prices π t ( pt+1 ) of a large cross-sectional collection of payoffs pt+1 ∈ Pt+1 can be computed from the prices of K + 1 particular “assets”: π t (ı) = E[m t+1 |Jt ] = E[m t+1 |Ut ]

(4.15)

π t (Fkt+1 ) = E[m t+1 Fkt+1 |Jt ] = E[m t+1 Fkt+1 |Ut ],

k = 1, 2, . . . , K .

If there does not exist a riskless asset or if some factors are not feasible payoffs, one can always interpret suitably normalized factors as returns on particular portfolios called mimicking portfolios. Moreover, since the only property of factors which matters is linear SDF spanning, one may assume without loss of generality that Var[Ft+1 |Ut ] is nonsingular to avoid redundant factors. The beta coefficients are then computed directly by:12 [β 1t , β 2t , . . . , β kt ] = Cov[ pt+1 , Ft+1 |Jt ] Var[Ft+1 |Ut ]−1 K β 0t = E[ pt+1 |Jt ] − β kt E[Ft+1 |Ut ]

(4.16)

k=1

to deduce the price: π t ( pt+1 ) = β 0t π t (ı) +

K

β kt π t (Fkt+1 ).

(4.17)

k=1

The cross-sectional reduction of dimension consists of computing only K + 1 factor prices (π t (ı), π t (Fkt+1 )) to price any payoff. The longitudinal reduction of dimension is also exploited since the pricing formula for these factors (4.15) depends on the conditioning information Jt only through Ut . 12 When the payoffs include dividends, the only relevant conditioning information is characterized by state

variables: pt+1 , Ft+1 |Ut Dt ! pt+1 |Ut . Dt

Cov[ pt+1 , Ft+1 |Jt ]

=

Dt Cov

E[ pt+1 |J t]

=

Dt E

!

5. Latent Variable Models for SDFs

165

(ii) Even though the linear beta pricing formula P1 is mathematically equivalent to the linear SDF spanning property P2, it is interesting to characterize it by a property of the set of feasible returns under the maintained Assumption 2.4 of SDF spanning. More precisely, since this assumption allows us to write: π t ( pt+1 ) = E[m t+1 E[ pt+1 |Ft+1 , Jt ]|Jt ],

(4.18)

P1 is obtained as soon as a linear factor model of payoffs or returns is assumed (see e.g. Engle, Ng and Rothschild (1990)13 ). It means that the conditional expectation of payoffs given factors and Jt coincide with the conditional affine regression (given Jt ) of these payoffs on these factors: E[ pt+1 |Ft+1 , Jt ] = E L t [ pt+1 |Ft+1 ] = β 0t +

K

β kt Fkt+1 .

(4.19)

k=1

Such a linear factor model can for instance be deduced from an assumption of joint conditional normality of returns and factors. This is the case when factors are themselves returns on some mimicking portfolios and returns are jointly conditionally gaussian. The standard CAPM illustrates the linear structure that is obtained from such a joint normality assumption for returns. However, the main implication of linear beta pricing is the zero-price property of idiosyncratic risk (ε t+1 in the notation of Definition 4.1) since only the systematic part of the payoff pt+1 is compensated:14 π t ( pt+1 ) = π t (E L t ( pt+1 |Ft+1 )),

(4.20)

that is: π t (εt+1 ) = 0. As we will see in more details in Subsection 4.3 below, this zero-price property for the idiosyncratic risk lays the basis for the APT model developed by Ross (1976). Moreover, if a factor is not compensated because E[m t+1 Fkt+1 |Ut ] = 0, it can be forgotten in the beta pricing formula. In other words, irrespective of the statistical procedure used to build the factors, only the compensated factors have to be kept: kt = E[m t+1 Fkt+1 |Ut ] = 0,

for k = 1, . . . , K .

(4.21)

(iii) The minimal list of factors that have to be kept may also be characterized by the spanning interpretation P2. In this respect, the number of factors is purely a matter of convention: how many factors do we want to introduce to span the one-dimensional space where the SDF evolves? The existence of the SDF proves that a one-factor model with the SDF itself as 13 However, these authors maintain simultaneously the two assumptions of linear SDF spanning and linear factor

model of returns. These two assumptions are clearly redundant as explained above. 14 The prices of the systematic and idiosyncratic parts are defined, by abuse of notation, by their conditional

scalar product with the SDF m t+1 .

166

´ Renault R. Garcia and E.

the sole factor is always correct. The definition of K factors becomes an issue for reasons such as economic interpretation, statistical procedures or financial strategies. Moreover, this definition can be changed as long as it keeps invariant the corresponding spanned vectorial space. For instance, one may assume that, conditionally to Jt , the factors are mutually uncorrelated, that is V [Ft+1 |Jt ] is a nonsingular diagonal matrix. One may also rescale the factors to obtain unit variance factors (statistical motivation) or unit cost factors (financial motivation). Let us focus on the latter by assuming that: kt = E[m t+1 Fkt+1 |Ut ] = 1,

for k = 1, . . . , K .

(4.22)

By (4.21), the factor Fkt+1 can be replaced by its scaled value Fkt+1 /kt to get (4.22) without loss of generality. Each factor can then be interpreted as a return on a portfolio (a payoff of unit price) even though we do not assume that there exists a feasible mimicking portfolio (Fkt+1 ∈ Pt+1 ). This normalization rule allows us to prove that the coefficients in the multibeta model of expected returns (P3) are given by: ν kt = E[Fkt+1 |Ut ] − ν 0t

for k = 1, . . . , K .

(4.23)

Since, on the other hand, it is easy to check that: ν 0t =

1 E[m t+1 |Ut ]

(4.24)

coincides with the risk-free return when there exists a risk-free asset, the multibeta model (P3) of expected returns can be rewritten in the more standard form: K β rk (Ut )[E[Fkt+1 |Ut ] − ν 0t ], (4.25) E[rt+1 |Ut ] − ν 0t = k=1

which gives the risk premium of the asset as a linear combination of the risk premia of the various factors, with weights defined by the beta coefficients viewed as risk quantities. Moreover, (4.25) is very useful for statistical inference in factor models (see in particular Subsection 4.3) since it means that the beta pricing formula is characterized by the nullity of the intercept term in the conditional regression of net returns on net factors, given Ut .

4.3 Conditional factor analysis Factor analysis with a cross-sectional point of view has been popularized by Ross (1976) to provide some foundations to multibeta models of expected returns. The basic idea is to start, for a countable sequence of assets i = 1, 2, . . . with the

5. Latent Variable Models for SDFs

167

decomposition of their payoffs or returns into systematic and idiosyncratic parts with respect to K variables Fkt+1 , 1, 2, . . . , K , considered as candidate factors: rit+1 = β ri0 (Ut ) +

K

β rik (Ut )Fkt+1 + εit+1

k=1

E[εit+1 |Ut ] = 0 Cov[Fkt+1 , ε it+1 |Ut ] = 0 ∀k = 1, 2, . . . , K , for i = 1, 2, . . .

(4.26)

Since, as already explained, the multibeta model (P3) of expected returns amounts to assume that idiosyncratic risks are not compensated, that is: E[m t+1 εit+1 |Ut ] = 0

for i = 1, 2, . . . ,

(4.27)

a natural way to look for foundations of this pricing model is to ask why idiosyncratic risk should not be compensated. Ross (1976) provides the following explanation. For a portfolio in the n assets defined by shares θ in , i = 1, 2, . . . , n of wealth invested: n θ in=1 , (4.28) i=1

the unsystematic risk is measured by: Var

n

! n 2 2 θ in εit+1 |Ut = θ in σ i (Ut ),

i=1

(4.29)

i=1

if we assume that the individual idiosyncratic risks are mutually uncorrelated: Cov[εit+1 ε jt+1 |Ut ] = 0 if i = j,

(4.30)

and we denote the asset idiosyncratic conditional variances by: σ i2 (Ut ) = Var[ε it+1 |Ut ]. Therefore, if it is possible to find a sequence (θ in )1≤i≤n, n = 1, 2, . . . conformable to (4.28) and (4.31) below: P lim

n=∞

n

2 2 θ in σ i (Ut ) = 0,

(4.31)

i=1

the idiosyncratic risk can be diversified and should not be compensated by a simple no-arbitrage argument. Typically, this result will be valid with bounded conditional variances and equally-weighted portfolios (θ in = 1/n for i = 1, 2, . . .). In other words, according to Ross (1976), factors have as a basic property to define idiosyncratic risks which are mutually uncorrelated. This justifies beta pricing

´ Renault R. Garcia and E.

168

with respect to them and provides the following decomposition of the conditional covariance matrix of returns: t = β t φ t β t + Dt

(4.32)

where t , β t , φ t , Dt are matrices of respective sizes n × n, n × k, k × k and n × n defined by: t = Cov(rit+1 , r jt+1 |Ut ) 1≤i≤n,1≤ j≤n β t = β rik (Ut ) 1≤i≤n,1≤k≤K φt Dt

= (Cov(Fkt+1 , Flt+1 |Ut ))1≤k≤K ,1≤l≤K = Cov(εit+1 , ε jt+1 |Ut ) 1≤i≤n,1≤ j≤n

(4.33)

with the maintained assumption that Dt is a diagonal matrix. In the particular case where returns and factors are jointly conditionally gaussian given Ut , the returns are mutually independent knowing the factors in the conditional probability distribution given Ut . We have therefore specified a Factor Analysis model in a conditional setting. Moreover, if one adopts in such a setting some well-known results in the Factor Analysis methodology, one can claim that the model is fully defined by the decomposition (4.32) of the covariance matrix of returns with the diagonality assumption15 about the idiosyncratic variance matrix Dt . In particular, this decomposition defines by itself the set of K -dimensional variables Ft+1 conformable to it with the interpretation (4.33) of the matrices: Ft+1 = E[Ft+1 |Ut ] + φ t β t t−1 (rt+1 − E[rt+1 |Ut ]) + z t+1 ,

(4.34)

where rt+1 = (rit+1 )1≤i≤n and z t+1 is a K -dimensional variable assumed to be independent of rt+1 given Jt and such that: E[z t+1 |Jt ] = 0 Var[z t+1 |Jt ] = φ t − φ t β t t−1 β t φ t .

(4.35)

It means that, up to an independent noise z t (which represents factor indeterminacy), the factors are rebuilt by the so-called “Thompson Factor scores”: t,t+1 = E[Ft+1 |Ut ] + φ t β t t−1 (rt+1 − E(rt+1 |Ut )), F

(4.36)

t,t+1 = E[Ft+1 |Ut , rt+1 ] in the which correspond to the conditional expectation: F particular case where returns and factors are jointly gaussian given Ut . To summarize, according to Ross (1976) adapted in a conditional setting with latent variables, the question of specifying a multibeta model of expected returns 15 Chamberlain and Rothschild (1983) have proposed to take advantage of the sequence model (n → ∞) to

weaken the diagonality assumption on Dt by defining an approximate factor structure. We consider here a factor structure for fixed n.

5. Latent Variable Models for SDFs

169

can be addressed in two steps. In a first step, one should identify a factor structure for the family of returns: t

= β t φ t β t + Dt , Dt diagonal.

(4.37)

In a second step, the issue of a multibeta model for expected returns is addressed:16 E[rt+1 |Ut ] = β t E[Ft+1 |Ut ].

(4.38)

Due to the difficulty of disentangling the dynamics of the beta coefficients in β t from the one of the factors, both at first order E[Ft+1 |Ut ] in (4.38) and at second order φ t = Var[Ft+1 |Ut ] in (4.37), a common solution in the literature is to add the quite restrictive assumption that the matrix β t of conditional factor loadings is deterministic and time invariant: β t = β for every t.

(4.39)

It should be noticed that assumption (4.39) does not imply per se that conditional betas coincide with unconditional ones since unconditional betas are not unconditional expectations of conditional ones. However, since by (4.39): r t+1 = E(rt+1 |Ut ) − β E(Ft+1 |Ut ) + β Ft+1 + ε t+1 ,

(4.40)

it can be seen that β will coincide with the matrix of unconditional betas if and only if: Cov[E(rt+1 |Ut ) − β E(Ft+1 |Ut ), Ft+1 |Ut ] = 0.

(4.41)

In particular, if the conditional multibeta model (4.38) of expected returns and the assumption (4.39) of constant conditional betas are maintained simultaneously, the unconditional multibeta model of expected returns can be deduced: Ert+1 = β E Ft+1 .

(4.42)

Moreover, this joint assumption guarantees that the conditional factor analytic model (4.40) can be identified by a standard procedure of static factor analysis since: Var(εt+1 ) = E(Var(ε t+1 |Ut )) = E(Dt )

(4.43)

will be a diagonal matrix as Dt . This remark has been fully exploited by King, Sentana and Wadhwani (1994). However, a general inference methodology for the 16 According to the comments following Theorem 4.4, we assume that factors are suitably scaled in order to get

the convenient interpretation for the coefficients of the multibeta model of expected returns. Such a scaling can be done without loss of generality since it does not modify the property (4.37). Moreover, in (4.38), returns and factors are implicitly considered in excess of the risk-free rate (net returns and factors).

170

´ Renault R. Garcia and E.

conditional factor analytic model remains to be stated. First, the restrictive assumption of fixed conditional betas should be relaxed. Second, even with fixed betas, one would like to be able to identify the conditional factor analytic model (4.40) without maintaining the joint hypothesis (4.38) of a multibeta model of expected returns. In this latter case, a factor stochastic volatility approach (see e.g. Meddahi and Renault (1996) and Pitt and Shephard (1999)) should be well-suited. The narrow link between our general state variable setting and the nowadays widespread stochastic volatility model is discussed in the next section. 5 A dynamic asset pricing model with latent variables In the last section, we analyzed the cross-sectional restrictions imposed by financial asset pricing theories in the context of factor models. While these factor models were conditioned on an information set, the emphasis was not put on the dynamic behavior of asset returns. In this section, we propose an intertemporal asset pricing model based on a conditioning on state variables. Using assumptions spelled out in Section 3, we will accommodate a rich intertemporal framework where the stochastic discount factor can represent nonseparable preferences such as recursive utility.17 5.1 An equilibrium asset pricing model with recursive utility Many identical infinitely lived agents maximize their lifetime utility and receive each period an endowment of a single nonstorable good. We specify a recursive utility function of the form: Vt = W (Ct , µt ),

(5.1)

where W is an aggregator function that combines current consumption C t with t+1 | Jt ), a certainty equivalent of random future utility V t+1 , given the µt = µ(V information available to the agents at time t, to obtain the current-period lifetime utility Vt . Following Kreps and Porteus (1978), Epstein and Zin (1989) propose the CES function as the aggregator function, i.e. Vt = [C tρ + βµρt ] ρ . 1

(5.2)

The way the agents form the certainty equivalent of random future utility is based α |It ], on their risk preferences, which are assumed to be isoelastic, i.e. µαt = E[V t+1 17 In the proposed intertemporal asset pricing model, we will specify the stochastic discount factor in an

equilibrium setting. We will therefore make our stochastic assumptions on economic fundamentals such as consumption and dividend growth rates. In Garcia, Luger and Renault (1999), we make the same types of assumptions directly on the pair SDF-stock returns without reference to an equilibrium model. Similar asset pricing formulas and implications of the presence of leverage effects are obtained in this less specific framework.

5. Latent Variable Models for SDFs

171

where α ≤ 1 is the risk aversion parameter (1 − α is the Arrow–Pratt measure of relative risk aversion). Given these preferences, the following Euler condition must be valid for any asset j if an agent maximizes his lifetime utility (see Epstein and Zin (1989)): γ (ρ−1) ! γ −1 γ C t+1 Mt+1 R j,t+1 |Jt = 1, E β (5.3) Ct where Mt+1 represents the return on the market portfolio, R j,t+1 the return on any asset j, and γ = ρα . The stochastic discount factor is therefore given by: γ (ρ−1) γ −1 γ C t+1 m t+1 = β Mt+1 . (5.4) Ct The parameter ρ is associated with intertemporal substitution, since the elasticity of intertemporal substitution is 1/(1 − ρ). The position of α with respect to ρ determines whether the agent has a preference towards early resolution of uncertainty (α < ρ) or late resolution of uncertainty (α > ρ).18 Since the market portfolio price, say PtM at time t, is determined in equilibrium, it should also verify the first-order condition: γ (ρ−1) ! γ γ C t+1 Mt+1 |Jt = 1. (5.5) E β Ct In this model, the payoff of the market portfolio at time t is the total endowment of the economy Ct . Therefore the return on the market portfolio Mt+1 can be written as follows: P M + Ct+1 Mt+1 = t+1 M . Pt Replacing Mt+1 by this expression, we obtain: ! Ct+1 γ ρ γ γ γ (λt+1 + 1) |Jt , λt = E β Ct

(5.6)

where: λt = PtM /C t . The pricing of assets with price St which pay dividends Dt such as stocks will lead us to characterize the joint probability distribution of the stochastic process (X t , Yt , Jt ) where: X t = log(Ct /C t−1 ) and Yt = log(Dt /Dt−1 ). As announced in Section 3, we define this dynamics through a stationary vectorprocess of state variables Ut so that: Jt = ∨τ ≤t [X τ , Yτ , Uτ ].

(5.7)

18 As mentioned in Epstein and Zin (1991), the association of risk aversion with α and intertemporal sustitution

with ρ is not fully clear, since at a given level α of risk aversion, changing ρ affects not only the elasticity of intertemporal sustitution but also determines whether the agent will prefer early or late resolution of uncertainty.

´ Renault R. Garcia and E.

172

Given this model structure (with log(C t /Ct−1 ) serving as a factor Ft ), we can restate Assumptions 3.1 and 3.2 as: Assumption 5.1 The pairs (X t , Yt )1≤t≤T , t = 1, . . . , T are mutually independent knowing U1T = (Ut )1≤t≤T . Assumption 5.2 The conditional probability distribution of (X t, Yt ) given U1T = (Ut )1≤t≤T coincides, for any t = 1, . . . , T , with the conditional probability distribution given U1t = (Uτ )1≤τ ≤t . As mentioned in Section 3, Assumptions 5.1 and 5.2 together with Assumption 3.3 and the Markovianity of state variables Ut allow us to characterize the joint probability distribution of the (X t , Yt ) pairs, t = 1, . . . , T , given U1T , by: ,[(X t , Yt )1≤t≤T |U1T ]

=

T 0

,[X t , Yt |Ut ].

(5.8)

t=1

Proposition 5.3 below provides the exact relationship between the state variables and equilibrium prices. Proposition 5.3 Under Assumptions 5.1 and 5.2 we have: PtM = λ(Ut )Ct,

St = ϕ(Ut )Dt ,

where λ(Ut ) and ϕ(Ut ) are respectively defined by: ! Ct+1 γ ρ γ γ γ λ(Ut ) = E β (λ(Ut+1 ) + 1) |Ut , Ct and

ϕ(Ut ) = E β γ

Ct+1 Ct

γ ρ−1

λ(Ut+1 ) + 1 λ(Ut )

γ −1

Dt+1 |Ut . (ϕ(Ut+1 ) + 1) Dt

Therefore, the functions λ(·), ϕ(·) are defined on R P if there are P state variables. Moreover, the stationarity property of the U process together with assumptions 5.1, 5.2 and a suitable specification of the density function (3.6) allow us to make the process (X, Y ) stationary by a judicious choice of the initial distribution of (X, Y ). In this setting, a contraction mapping argument may be applied as in Lucas (1978) to characterize the functions λ(·) and ϕ(·) according to Proposition 5.3. It should be stressed that this framework is more general than the Lucas one because the state variables Ut are given by a general multivariate Markovian process (while a Markovian dividend process is the only state variable in Lucas

5. Latent Variable Models for SDFs

173

(1978)). Using the return definition for the market portfolio and asset St , we can write: log Mt+1 = log

λ(Ut+1 ) + 1 + X t+1 , and λ(Ut )

log Rt+1 = log

(5.9)

ϕ(Ut+1 ) + 1 + Yt+1 . ϕ(Ut )

Hence, the return processes (Mt+1 , Rt+1 ) are stationary as U, X and Y , but, contrary to the stochastic setting in the Lucas (1978) economy, are not Markovian due to the presence of unobservable state variables U . Given this intertemporal model with latent variables, we will show how standard asset pricing models will appear as particular cases under some specific configurations of the stochastic framework. In particular, we will analyze the pricing of bonds, stocks and options and show under which conditions the usual models such as the CAPM or the Black–Scholes model are obtained.

5.2 Revisiting asset pricing theories for bonds, stocks and options through the leverage effect In this section, we introduce an additional assumption on the probability distribution of the fundamentals X and Y given the state variables U . Assumption 5.4

X t+1 Yt+1

|Utt+1

∼ℵ

m X t+1 m Y t+1

,

σ 2X t+1 σ X Y t+1 σ X Y t+1 σ 2Y t+1

!! ,

where m X t+1 = m X (U1t+1 ), m Y t+1 = m Y (U1t+1 ), σ 2X t+1 = σ 2X (U1t+1 ), σ X Y t+1 = σ X Y (U1t+1 ), σ 2Y t+1 = σ 2X (U1t+1 ). In other words, these mean and variance covariance functions are time-invariant and measurable functions with respect to Utt+1 , which includes both Ut and Ut+1 . This conditional normality assumption allows for skewness and excess kurtosis in unconditional returns. It is also useful for recovering as a particular case the Black–Scholes formula.19 19 It can also be argued that, if one considers that the discrete-time interval is somewhat arbitrary and can be

infinitely split, log-normality (conditional on state variables U ) is obtained as a consequence of a standard central limit argument given the independence between consecutive (X, Y ) given U .

´ Renault R. Garcia and E.

174

5.2.1 The pricing of bonds The price of a bond delivering one unit of the good at time T , B(t, T ), is given by the following formula: B(t, T )], (5.10) B(t, T ) = E t [ where: B(t, T ) = β γ (T −t) atT (γ ) exp((α − 1)

T −1 τ =t

T −1 1 m X τ +1 + (α − 1)2 σ 2 X τ +1 ), 2 τ =t

γ −1

1 −1 1+λ(U1τ +1 ) . with: atT (γ ) = τT=t λ(U1τ ) This formula shows how the interest rate risk is compensated in equilibrium, and in particular how the term premium is related to preference parameters. To be more explicit about the relationship between the term premium and the preference parameters, let us first notice that we have a natural factorization: B(t, T ) =

T0 −1

B(τ , τ + 1).

(5.11)

τ =t

Therefore, while the discount parameter β affects the level of the B, the two other parameters α and γ affect the term premium (with respect to the return-to-maturity expectations hypothesis, Cox, Ingersoll, and Ross (1981)) through the ratio: 1 −1 B(τ , τ + 1)) E t ( τT=t B(t, T ) = . 1T −1 1T −1 E t τ =t B(τ , τ + 1) E t τ =t E τ B(τ , τ + 1) To better understand this term premium from an economic point of view, let us compare implicit forward rates and expected spot rates at only one intermediary period between t and T : Covt [ B(t, τ ) B(τ , T ) B(t, τ ), B(τ , T )] Et B(t, T ) B(τ , T ) + = = Et . (5.12) B(t, τ ) E t B(t, τ ) E t B(t, τ ) Up to Jensen inequality, Equation (5.12) proves that a positive term premium is brought about by a negative covariation between present and future B. Given the expression for B(t, T ) above, it can be seen that for von-Neuman preferences (γ = 1) the term premium is proportional to the square of the coefficient of relative risk aversion (up to a conditional stochastic volatility effect). Another important observation is that even without any risk aversion (α = 1), preferences still affect the term premium through the nonindifference to the timing of uncertainty resolution (γ = 1). There is however an important sub-case where the term premium will be preference-free because the stochastic discount factor B(t, T ) coincides with the

5. Latent Variable Models for SDFs

175

observed rolling-over discount factor (the product of short-term future bond prices, B(τ , τ +1), τ = t, . . . , T −1). Taking Equation (5.11) into account, this will occur as soon as B(τ , τ + 1) = B(τ , τ + 1), that is when B(τ , τ + 1) is known at time τ . From the expression of B(t, T ) above, it is easy to see that this last property stands if and only if the mean and variance parameters m X τ +1 and σ X τ +1 depend on Uττ +1 only through Uτ . This allows us to highlight the so-called “leverage effect” which appears when the probability distribution of (X t+1 ) given Utt+1 depends (through the functions m X , σ 2X ) on the contemporaneous value Ut+1 of the state process. Otherwise, the noncausality Assumption 5.2 can be reinforced by assuming no instantaneous causality from X to U . In this case, ,(X t |U1T ) = ,(X t |U1t−1 ); it is this property which ensures that short-term stochastic discount factors are predetermined, so the bond pricing formula becomes preference-free: B(t, T ) = E t

T0 −1

B(τ , τ + 1).

τ =t

Of course this does not necessarily cancel the term premiums but it makes them preference-free in the sense that the role of preference parameters is fully hidden in short-term bond prices. Moreover, when there is no interest rate risk because the consumption growth rates X t are serially independent, it is straightforward to check that constant m X t+1 and σ 2X t+1 imply constant λ(·) and in turn B(t, T ) = B(t, T ), with zero term premiums. 5.2.2 The pricing of stocks The stock price formula is given by: γ −1 α−1 t+1 C ) 1 + λ(U t+1 1 (St+1 + Dt+1 ). St = E t β γ Ct λ(U1t ) By a recursive argument, this Euler condition can be rewritten as follows: α−1 D C T T E t β γ (T −t) atT (γ )btT = 1, (5.13) Ct Dt 1 −1 with: btT = τT=t (1 + ϕ(U1τ +1 ))/ϕ(U1τ ). Under conditional log-normality Assumption 5.4, we obtain: T T T 1 B(t, T )btT exp mY τ + σ 2 Y τ + (α − 1) σ XY τ = 1. Et 2 τ =t+1 τ =t+1 τ =t+1 (5.14)

176

´ Renault R. Garcia and E.

With the definitional equation: ! T T 1 ST T ϕ(U1T ) 2 exp |U = mY τ + σ Yτ , E St 1 ϕ(U1t ) 2 τ =t+1 τ =t+1

(5.15)

a useful way of writing the stock pricing formula is: E t [Q X Y (t, T )] = 1,

(5.16)

where:

! T t ST T T ϕ(U1 ) Q X Y (t, T ) = B(t, T )bt exp (α − 1) σ XY τ E |U . St 1 ϕ(U1T ) τ =t+1

(5.17)

To understand the role of the factor Q X Y (t, T ), it is useful to notice that it can be factorized: T0 −1 Q X Y (t, T ) = Q X Y (τ , τ + 1), τ =t

and that there is an important particular case where Q X Y (τ , τ +1) is known at time τ and therefore equal to one by (5.16). This is when there is no leverage effect in the sense that ,(X t , Yt |U1T ) = ,(X t , Yt |U1t−1 ). This means that not only there is no leverage effect neither for X nor for Y , but also that the instantaneous covariance σ X Y t itself does not depend on Ut . In this case, we have Q X Y (t, T ) = 1. Since we also have B(τ , τ + 1) = B(τ , τ + 1), we can express the conditional expected stock return as: ! T ST T 1 1 ϕ(U1T ) E |U = 1T −1 σ XY τ . exp (1 − α) St 1 b T ϕ(U1t ) τ =t+1 τ =t B(τ , τ + 1) t For pricing over one period (t to t+1), this formula provides the agent’s expectation of the next period return (since in this case the only relevant information is U1t ): St+1 1 + ϕ(U1t+1 ) t 1 E exp[(1 − α)σ X Y t+1 ], |U1 = t+1 St B(t, t + 1) ϕ(U1 ) that is:

! 1 St+1 + Dt+1 t |U1 = E exp[(1 − α)σ X Y t+1 ], St B(t, t + 1)

(5.18)

This is a particularly striking result since it is very close to a standard conditional CAPM equation, which remains true for any value of the preference parameters α and ρ. While Epstein and Zin (1991) emphasize that the CAPM obtains for α = 0 (logarithmic utility) or ρ = 1 (infinite elasticity of intertemporal substitution), we stress here that the relation is obtained under a particular stochastic setting for any

5. Latent Variable Models for SDFs

177

values of α and ρ. Remarkably, the stochastic setting without leverage effect which produces this CAPM relationship will also produce most standard option pricing models (for example Black and Scholes (1973) and Hull and White (1987)), which are of course preference-free.20 5.2.3 A generalized option pricing formula The Euler condition for the price of a European option is given by: γ −1 α−1 T0 −1 τ +1 1 + λ(U1 ) CT π t = E t β γ (T −t) Max[0, ST − K ]. (5.19) τ Ct λ(U ) 1 τ =t It is worth noting that the option pricing formula (5.19) is path-dependent with respect to the state variables; it depends not only on the initial and terminal values of the process Ut but also on its intermediate values.21 Indeed, it is not so surprising that when preferences are not time-separable (γ = 1), the option price may depend on the whole past of the state variables. Using Assumptions 5.2, 5.2 and 5.4, we arrive at an extended Black–Scholes formula: " 6 K B(t, T ) πt = E t Q ∗X Y (t, T )"(d1 ) − "(d2 ) , (5.20) St St where:

∗ S Q X Y (t,T ) 1/2 T log tK 1 B(t,T ) 2 + σ Yτ , d1 = T ( τ =t+1 σ 2Y τ )1/2 2 τ =t+1 d2 = d1 −

T τ =t+1

Q ∗X Y (t, T ) =

1/2 σ 2Y τ

, and

Q X Y (t, T ) ϕ(U1T ) . ϕ(U1t ) btT

(5.21)

To put this general formula in perspective, we will compare it to the three main approaches that have been used for pricing options: equilibrium option pricing, arbitrage-based option pricing, and GARCH option pricing. The latter pricing model can be set either in an equilibrium framework or in an arbitrage framework. Concerning the equilibrium approach, our setting is more general than 20 A similar parallel is drawn in an unconditional two-period framework in Breeden and Litzenberger (1978). 21 Since we assume that the state variable process is Markovian, λ(U T ) does not depend on the whole path of 1 state variables but only on the last values UT .

´ Renault R. Garcia and E.

178

the usual expected utility framework since it accommodates non-separable preferences. The stochastic framework with latent variables could also accommodate state-dependent preferences such as habit formation based on state variables. Of course, the most popular option pricing formulas among practitioners are based on arbitrage rather than on equilibrium in order to avoid in particular the specification of preferences. From the start, it should be stressed that our general formula (5.20) nests a large number of preference-free extensions of the Black– 1 −1 B(t, T ) = τT=t B(τ , τ + 1), Scholes formula. In particular if Q X Y (t, T ) = 1 and one can see that the option price (5.20) is nothing but the conditional expectation of the Black–Scholes price,22 where the expectation is computed with respect to the joint probability distribution of the rolling-over / interest rate r t,T = T T −1 2 − τ =t log B(τ , τ + 1) and the cumulated volatility σ t,T = τ =t+1 σ Y τ . This framework nests three well-known models. First, the most basic ones, the Black and Scholes (1973) and Merton (1973) formulas, when interest rates and volatility are deterministic. Second, the Hulland White (1987) stochastic volatility 2 extension, since σ t,T = Var log SSTt |U1T corresponds to the cumulated volatil T ity t σ 2u du in the Hull and White continuous-time setting.23 Third, the formula allows for stochastic interest rates as in Turnbull and Milne (1991) and Amin and Jarrow (1992). However, the usefulness of our general formula (5.20) comes above all from the fact that it offers an explicit characterization of instances where the preference-free paradigm cannot be maintained. Usually, preference-free option pricing is underpinned by the absence of arbitrage in a complete market setting. However, our equilibrium-based option pricing does not preclude incompleteness and points out in which cases this incompleteness will invalidate the preferencefree paradigm. The only cases of incompleteness which matter in this respect occur precisely when at least one of the two following conditions: Q X Y (t, T ) = 1

(5.25)

22 We refer here to a BS option pricing formula where dividend flows arrive during the lifetime of the option

and are accounted for in the definition of the risk neutral probability, while the option payoff does not include dividends. In other words, the BS option price is given by: π tB S

=

e−r (T −t) E t [Max(0, ST − K )]

(5.22)

=

e−δ(T −t) St "(d1 ) − K e−r (T −t) "(d2 ),

(5.23)

since in the risk neutral world: S log T N ((r − δ)(T − t), σ 2 (T − t)), St where δ is the intensity of the dividend flow.

(5.24)

23 See Subsection 5.3 for a detailed comparison between standard stochastic volatility models and our state

variable framework.

5. Latent Variable Models for SDFs

B(t, T ) =

T0 −1

B(τ , τ + 1)

179

(5.26)

τ =t

is not fulfilled. In general, preference parameters appear explicitly in the option pricing formula through B(t, T ) and Q X Y (t, T ). However, in so-called preference-free formulas, it happens that these parameters are eliminated from the option pricing formula through the observation of the bond price and the stock price. In other words, even in an equilibrium framework with incomplete markets, option pricing is preference-free if and only if there is no leverage effect in the general sense that B(t, t + 1) are predetermined. This result generalizes Amin and Q X Y (t, t + 1) and Ng (1993), who called this effect predictability. It is worth noting that our results of equivalence between preference-free option pricing and no instantaneous causality between state variables and asset returns are consistent with another strand of the option pricing literature, namely GARCH option pricing. Duan (1995) derived it first in an equilibrium framework, but Kallsen and Taqqu (1998) have shown that it could be obtained with an arbitrage argument. Their idea is to complete the markets by inserting the discrete-time model into a continuous-time one, where conditional variance is constant between two integer dates. They show that such a continuous-time embedding makes possible arbitrage pricing which is per se preference-free. It is then clear that preference-free option pricing is incompatible with the presence of an instantaneous causality effect, since it is such an effect that prevents the embedding used by Kallsen and Taqqu (1998).

5.3 A comparison with stochastic volatility models The typical stochastic volatility model (SV model hereafter) introduces a positive stochastic process such that its squared value h t represents the conditional variance of the value at time (t + 1) of a second-order stationary process of interest, given a conditioning information set Jt . In our setting, it is natural to define the conditioning information set Jt by (5.8). It means that the information available at time t is not summarized in general by the observation of past and current values of asset prices, since it also encompasses additional information through state variables Ut . Such a definition is consistent with the modern definition of SV processes (see Ghysels, Harvey and Renault, 1996, for a survey). It incorporates unobserved components that might capture well-documented evidence about conditional leptokurtosis and leverage effects of asset returns (given past and current returns). Moreover, such unobserved components are included in the relevant conditioning information set for option pricing models as in Hull and White (1987). The focus of interest in this subsection are the time series properties of asset returns implied

´ Renault R. Garcia and E.

180

by the dynamic asset pricing model presented in Section 5.1. These time series of returns can be seen as stochastic volatility processes by Assumption 5.4 on the conditional probability distribution of the fundamentals (X t+1 , Yt+1 ) given Jt . We focus on (X t+1 , Yt+1 ) instead of asset returns since, by (5.9), the joint conditional probability distribution (given U1t+1 ) of returns for the two primitive assets is defined by Assumption 5.4 up to a shift in the mean. Let us first consider the univariate dynamics in terms of the innovation process ηYt+1 of Yt+1 with respect to Jt defined as: ηYt+1 = Yt+1 − E[m Y (U1t+1 )|U1t ].

(5.27)

The associated volatility and kurtosis dynamics are then characterized by: h tY

= Var[ηYt+1 |U1t ] = Var[m Y (U1t+1 )|U1t ] + E[σ 2Y (U1t+1 )|U1t ]

(5.28)

and Y µ4t

= E[η4Yt+1 |U1t ] = 3E[σ 4Y (U1t+1 )|U1t ] = 3[Var[σ 2Y (U1t+1 )|U1t ] + (E[σ 2Y (U1t+1 )|U1t ])2 ].

(5.29)

As far as kurtosis is concerned, Equations (5.28) and (5.29) provide a representation of the fat-tail effect and its dynamics, sometimes termed the heterokurtosis effect. This extends the representation of the standard mixture model, first introduced by Clark (1973) and extended by Gallant, Hsieh and Tauchen (1991). Indeed, in the particular case where: Var[m Y (U1t+1 )|U1t ] = 0,

(5.30)

we get the following expression24 for the conditional kurtosis coefficient: Y µ4t = 3[1 + (ctY )2 ] (h tY )2

(5.31)

with: 1

ctY =

(Var[σ 2Y (U1t+1 )|U1t ]) 2 E[σ 2Y (U1t+1 )|U1t ]

.

(5.32)

This expression emphasizes that the conditional normality assumption does not preclude conditional leptokurtosis with respect to a smaller set of conditioning information. It should be emphasized that formula (5.31) allows for even more 24 It corresponds to the formula given by Gallant, Hsieh and Tauchen (1991) on page 204.

5. Latent Variable Models for SDFs

181

leptokurtosis than the standard formula since the probability distributions considered are still conditioned on a large information set, including possibly unobserved components. An additional projection on the reduced information set defined by past and current values of observed asset returns will increase the kurtosis coefficient. In other words, our model allows for innovation terms in asset returns that, even standardized by a genuine stochastic volatility (including a mixture effect), are still leptokurtic. Moreover, condition (5.30) is likely not to hold, providing an additional degree of freedom in our representation of kurtosis dynamics. If we consider the stock return itself instead of the dividend growth, the violation of (5.30) is even more likely since m Y (U1t+1 ) is to be replaced by the “expected” return m Y (U1t+1 ) + log(ϕ(U1t+1 ) + 1/ϕ(U1t )). Condition (5.30) will be violated when this expected return differs from its expected value computed by investors according to our equilibrium asset pricing model, that is E[m Y (U1t+1 ) + log(ϕ(U1t+1 ) + 1/ϕ(U1t ))|U1t ]. We will show now that it is precisely this difference which can produce a genuine leverage effect in stock returns, as defined by Black (1976) and Nelson (1991) for conditionally heteroskedastic returns.25 This justifies a posteriori the use of the expression leverage effect in Section 5.2 to account for the fact that the probability distribution of (X t+1 , Yt+1 ) given U1t+1 depends (through the functions m X , m Y , σ X , σ Y and σ X Y ) on the contemporaneous value Ut+1 of the state process.26 According to the standard terminology, the stochastic volatility dividend process exhibits a leverage effect if and only if: Y Y Cov[ηYt+1 , h t+1 |U1t ] = Cov[m Y (U1t+1 ), h t+1 |U1t ] < 0.

(5.33)

Barring the restriction (5.30), if m Y (U1t+1 ) is truly a function of Ut+1 , the condition in (5.33) amounts to the negativity of the sum of two terms: Cov[m Y (U1t+1 ), Var[m Y (U1t+2 )|U1t+1 ]|U1t ]

(5.34)

Cov[m Y (U1t+1 ), E[σ 2Y (U1t+2 )|U1t+1 ]|U1t ].

(5.35)

and:

In other words, the leverage effect of the stochastic volatility process Yt+1 can be produced by any of the two following leverage effects or both.27 The conditional 25 We will conduct the discussion below in terms of m (U t+1 ) but it could be reinterpreted in terms of Y 1 m Y (U1t+1 ) + log(ϕ(U1t+1 ) + 1)/ϕ(U1t ). 26 The key point is that the mean functions m (U t+1 ) and m (U t+1 ) depend on U X 1 Y 1 t+1 . However, if these

functions are replaced by the shifted conditional expectations for asset returns according to (5.9), the functions σ X (U1t+1 ), σ Y (U1t+1 ) and σ X Y (U1t+1 ) will be reintroduced in these expected returns through the functions

λ(U1t+1 ) and ϕ(U1t+1 ) defined by Proposition 5.3. 27 This decomposition of the leverage effect in two terms is the exact analogue of the decomposition discussed in Fiorentini and Sentana (1998) and Meddahi (1999) for persistence.

182

´ Renault R. Garcia and E.

mean process m Y (U1t+1 ) may be a stochastic volatility process which features a leverage effect defined by the negativity of (5.34). Or the process Yt+1 itself may be characterized by a leverage effect and then (5.35) be negative, which means that bad news about expected returns (when m Y (U1t+1 ) is smaller than its unconditional expectations) implies on average a higher expected volatility of Y , that is a value of E[σ 2Y (U1t+2 )|U1t+1 ] greater than its unconditional mean. To summarize, Assumption 5.4 not only allows us to capture the standard features of a stochastic volatility model (in terms of heavy tails and leverage effects) but also provides for a richer set of possible dynamics. Moreover, we can certainly extend these ideas to multivariate dynamics either for the joint behavior of market and stock returns or for any portfolio consideration. For instance, the dependence of σ X Y (U1t+1 ) on the whole set of state variables offers great flexibility to model the stochastic behavior of correlation coefficients, as recently put forward empirically by Andersen et al. (1999). This last feature is clearly highly relevant for asset allocation or conditional beta pricing models.

6 Conclusion In this chapter, we provided a unifying analysis of latent variable models in finance through the concept of stochastic discount factor (SDF). We extended both the asset pricing factor models and the equilibrium dynamic asset pricing models through a conditioning on state variables. This conditioning enriches the dynamics of asset returns through instantaneous causality between the asset returns and the latent variables. Such correlation or leverage effects explain departures from usual CAPM pricing for stocks or Black and Scholes and Hull and White pricing for options. The dependence of conditional covariances on the state variables allows for a rich dynamic stochastic behavior of correlation coefficients which is important for asset allocation or value-at-risk strategies. The enriched set of empirical implications from such dynamic latent variable models requires us to set up a general inference methodology which will account for the inobservability of both cross-sectional factors and longitudinal latent variables. Indirect inference, efficient method of moments or Markov chain Monte Carlo (MCMC) for Bayesian inference are all avenues that can prove useful in this context, since they have been used successfully in stochastic volatility models.

References Amin, K.I. and Jarrow, R. (1992), Pricing options in a stochastic interest rate economy, Mathematical Finance, 3(3), 1–21. Amin, K.I. and Ng, V.K. (1993), Option Valuation with Systematic Stochastic Volatility, Journal of Finance, XLVIII, 3, 881–909.

5. Latent Variable Models for SDFs

183

Andersen, T.B., Bollerslev, T., Diebold, F.X. and Labys, P. (1999), The distribution of exchange rate volatility, NBER Working Paper no. 6961. Bansal, R., Hsieh, D. and Viswanathan, S. (1993), No arbitrage and arbitrage pricing: a new approach, Journal of Finance 48, 1231–62. Bartholomew, D.J. (1987), Latent Variable Models and Factor Analysis. Oxford University Press, Oxford. Black, F. (1976), Studies of stock market volatility Changes, 1976 Proceedings of the American Statistical Association, Business and Economic Statistics Section, pp. 177–81. Black, F. and Scholes, M. (1973), The pricing of options and corporate liabilities, Journal of Political Economy 81, 637–59. Breeden, D. and Litzenberger, R. (1978), Prices of state-contingent claims implicit in option prices, Journal of Business 51, 621–51. Burt, C. (1941), The Factors of the Mind: An Introduction to factor Analysis in Psychology. Macmillan, New York. Chamberlain, G. and Rothschild, M. (1983), Arbitrage and mean variance analysis on large asset markets, Econometrica 51, 1281–304. Clark, P.K. (1973), A subordinated stochastic process model with variance for speculative prices, Econometrica 41, 135–56. Cox, D.R. (1981), Statistical analysis of time series: some recent developments, Scandinavian Journal of Statistics 8, 93–115. Cox, J., Ingersoll, J. and Ross, S. (1981), A reexamination of traditional hypotheses about the term structure of interest rates, Journal of Finance 36, 769–99. Dai, Q. and Singleton, K.J. (1999), Specification analysis of term structure models, forthcoming in the Journal of Finance. Diebold, F.X. and Nerlove, M. (1989), The dynamics of exchange rate volatility: a multivariate latent factor ARCH model, Journal of Applied Econometrics 4, 1–21. Duan, J.C. (1995), The GARCH option pricing model, Mathematical Finance 5, 13–32. Duffie D. and Kan, R. (1996), A yield-factor model of interest rates, Mathematical Finance, 379–406. Engle, R.F., Ng, V. and Rothschild, M. (1990), Asset pricing with a factor arch covariance structure: empirical estimates with treasury bills, Journal of Econometrics 45, 213–38. Epstein, L. and Zin, S. (1989), Substitution, risk aversion and the temporal behavior of consumption and asset returns I: a theoretical framework, Econometrica 57, 937–69. Epstein, L. and Zin, S. (1991), Substitution, risk aversion and the temporal behavior of consumption and asset returns I: an empirical analysis, Journal of Political Economy 99, 2, 263–86. Ferson, W.E. and Korajczyk, R.A. (1995), Do arbitrage pricing models explain the predictability of stock returns, Journal of Business 68, 309–49. Fiorentini, G. and Sentana, E. (1998), Conditional means of time series processes and time series processes for conditional means, International Economic Review 39, 1101–18. Florens, J.-P. and Mouchart, M. (1982), A note on noncausality, Econometrica 50(3), 583–91. Florens, J.-P., Mouchart, M. and J.-Rollin, P. (1990), Elements of Bayesian Statistics. Dekker, New York. Gallant, A.R., Hsieh, D. and Tauchen, G. (1991), on fitting a recalcitrant series: the pound/dollar exchange rate 1974–1983, Nonparametric and Semiparametric Methods in Econometrics and Statistics, (eds. William Barnett, A., Jim Powell and

184

´ Renault R. Garcia and E.

Georges Tauchen), Cambridge University Press, Cambridge. Garcia R., Luger, R. and Renault, E. (1999), Asymmetric smiles, leverage effects and structural parameters, working paper, CIRANO, Montreal, Canada. Ghysels, E., Harvey, A. and Renault, E. (1996), Stochastic Volatility, Statistical Methods in Finance (C. Rao, R. and Maddala, G.S.). North-Holland, Amsterdam, pp. 119–91. Granger, C.W.J. (1969), Investigating causal relations by econometric models and cross-spectral methods, Econometrica 37, 424–38. Hamilton, J.D. (1989), A new approach to the economic analysis of nonstationary time series and the business cycle, Econometrica 57, 357–84. Hansen, L. and Richard, S. (1987), The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models, Econometrica 55, 587–614. Harrison, J.M. and Kreps, D. (1979), Martingale and Arbitrage in Multiperiod Securities Markets, Journal of Economic Theory 20, 381–408. Harvey, A. (1989), Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, Cambridge. Harvey, C.R. (1991), The world price of covariance risk, Journal of Finance 46, 111–57. Hull, J. and White, A. (1987), The pricing of options on assets with stochastic volatilities, Journal of Finance XLII, 281–300. Kallsen, J. and Taqqu, M.S. (1998), Option pricing in ARCH-type models, Mathematical Finance, 13–26. King, M., Sentana, E. and Wadhwani, S. (1994), Volatility and links between national stock markets, Econometrica 62, 901–33. Lintner, J. (1965), The Valuation of risk assets and the selection of risky investments in stock portfolio and capital budgets, Review of Economics and Statistics 47, 13–37. Kreps, D. and Porteus, E. (1978), Temporal resolution of uncertainty and dynamic choice theory, Econometrica 46, 185–200. Lucas, R. (1978), Asset prices in an exchange economy, Econometrica 46, 1429–45. Meddahi, N. (1999), Aggregation of long memory processes, unpublished paper, Universit´e de Montr´eal. Meddahi, N. and Renault, E. (1996), Aggregation and marginalization of GARCH and stochastic volatility models, GREMAQ DP 96.30.433, Toulouse. Merton, R.C. (1973), Rational theory of option pricing, Bell Journal of Economics and Management Science 4, 141–83. Nelson, D.B. (1991), Conditional heteroskedasticity in asset returns: a new approach, Econometrica 59, 347–70. Pitt, M.K. and Shephard, N. (1999), Time-varying covariances: a factor stochastic volatility approach, Bayesian Statistics 6, 547–70. Renault, E. (1999), Dynamic Factor Models in Finance, Core Lectures. Oxford University Press, Oxford, forthcoming. Ross, S. (1976), The arbitrage theory of capital asset pricing, Journal of Economic Theory 13, 341–60. Sharpe, W.F. (1964), Capital asset prices: a theory of market equilibrium under conditions of risk, Journal of Finance 19, 425–42. Sims, C.A. (1972), Money, income and causality, American Economic Review 62, 540–52. Spearman, C. (1927), The Abilities of Man. Macmillan, New York. Turnbull, S. and Milne, F. (1991), A simple approach to interest-rate option pricing, Review of Financial Studies 4, 87–121.

6 Monte Carlo Methods for Security Pricing∗ Phelim Boyle, Mark Broadie and Paul Glasserman

1 Introduction In recent years the complexity of numerical computation in financial theory and practice has increased enormously, putting more demands on computational speed and efficiency. Numerical methods are used for a variety of purposes of finance. These include the valuation of securities, the estimation of their sensitivities, risk analysis, and stress testing of portfolios. The Monte Carlo method is a useful tool for many of these calculations, evidenced in part by the voluminous literature of successful applications. For a brief sampling, the reader is referred to the stochastic volatility applications in Duan (1995), Hull and White (1987), Johnson and Shanno (1987), and Scott (1987);1 the valuation of mortgage-backed securities in Schwartz and Torous (1989); the valuation of path-dependent options in Kemna and Vorst (1990); the portfolio optimization in Worzel et al. (1994); and the valuation of interest-rate derivative claims in Carverhill and Pang (1995). In this paper we focus on recent methodological developments. We review the Monte Carlo approach and describe some recent applications in the finance area. In modern finance, the prices of the basic securities and the underlying state variables are often modelled as continuous-time stochastic processes. A derivative security, such as a call option, is a security whose payoff depends on one or more of the basic securities. Using the assumption of no arbitrage, financial economists have shown that the price of a generic derivative security can be expressed as the expected value of its discounted payouts. This expectation is taken with respect to a transformation of the original probability measure known as the equivalent martingale measure or the risk-neutral measure. The book by Duffie (1996) provides an excellent account of this material. The Monte Carlo method lends itself naturally to the evaluation of security prices represented as expectations. Generically, the approach consists of the following ∗ Reprinted form the Journal of Economic Dynamics and Control 21 (1977) 1267–1321. 1 Wiggins (1987) also studies pricing under stochastic volatility but does not use Monte Carlo simulation.

185

186

P. Boyle, M. Broadie and P. Glasserman

steps: • Simulate sample paths of the underlying state variables (e.g., underlying asset prices and interest rates) over the relevant time horizon. Simulate these according to the risk-neutral measure. • Evaluate the discounted cash flows of a security on each sample path, as determined by the structure of the security in question. • Average the discounted cash flows over sample paths. In effect, this method computes a multi-dimensional integral – the expected value of the discounted payouts over the space of sample paths. The increase in the complexity of derivative securities in recent years has led to a need to evaluate high dimensional integrals. Monte Carlo becomes increasingly attractive compared to other methods of numerical integration as the dimension of the problem increases. Consider the integral of the function f (x) over the d-dimensional unit hypercube. The simple (or crude) Monte Carlo estimate of the integral is equal to the average value of the function f over n points selected at random2 from the unit hypercube. From the strong law of large numbers this estimate converges to the true value of the integrand as n tends to infinity. In addition, the central limit theorem assures us √ that the standard error3 of the estimate tends to zero as 1/ n. Thus the error convergence rate is independent of the dimension of the problem and this is the dominant advantage of the method over classical numerical integration approaches. The only restriction on the function f is that it should be square integrable, and this is a relatively mild restriction. Furthermore, the Monte Carlo method is flexible and easy to implement and modify. In addition, the increased availability of powerful computers has enhanced the attractiveness of the method. There are some disadvantages of the method but in recent years progress has been made in overcoming them. One drawback is that for very complex problems a large number of replications may be required to obtain precise results. Different variance reduction techniques have been developed to increase precision. Two of the classical variance reduction techniques are the control variate approach and the antithetic variate method. More recently, moment matching, importance sampling, and conditional Monte Carlo methods have been introduced in finance applications. Another technique for speeding up the valuation of multidimensional integrals uses deterministic sequences rather than random sequences. These deterministic 2 In standard Monte-Carlo application the n points are usually not truly random but are generated by a deter-

ministic algorithm and are described as pseudorandom numbers. 3 We can readily estimate the variance of the Monte Carlo estimate by using the same set of n random numbers to estimate the expected value of f 2 .

6. Monte Carlo Methods for Security Pricing

187

sequences are chosen to be more evenly dispersed throughout the region of integration than random sequences. If we use these sequences to estimate multidimensional integrals we can often improve the convergence. Deterministic sequences with this property are known as low-discrepancy sequences or quasi-random sequences. Using this approach one can in theory derive deterministic error bounds, though the practical use of the bounds is problematic. In contrast, standard Monte Carlo yields simple, useful probabilistic error bounds. Although low-discrepancy sequences are well known in computational physics they have only recently been applied in finance problems. There are different procedures for generating such low-discrepancy sequences and these procedures are generally based on number theoretic methods. We describe some of the recent developments in this area. We also discuss applications of this approach to problems in finance and conduct some rough comparisons between standard Monte Carlo methods and two different quasi-random approaches. Until recently, the valuation of American style options was widely considered outside the scope of Monte Carlo. However Tilley (1993), Barraquand and Martineau (1995), and Broadie and Glasserman (1997), and have proposed approaches to this problem, and there has been other related work as well. We provide a brief survey of the recent research progress in this area. The layout of the paper is as follows. Variance reduction techniques are described in the next section. The ideas behind the use of low-discrepancy sequences and brief numerical comparisons with standard Monte Carlo methods are given in Section 3. Price sensitivity estimation using simulation is discussed in Section 4. Various approaches to pricing American options using simulation are briefly described in Section 5. Other issues are touched on briefly in Section 6.

2 Variance reduction techniques In this section, we first discuss the role of variance reduction in meeting the broader objective of improving the computational efficiency of Monte Carlo simulations. We then discuss specific variance reduction techniques and illustrate their application to pricing problems.

2.1 Variance reduction and efficiency improvement The reduction of variance seems so obviously desirable that the precise argument for its benefit is sometimes overlooked. We briefly review the underlying justification for variance reduction and examine it from the perspective improving computational efficiency.

188

P. Boyle, M. Broadie and P. Glasserman

Suppose we want to compute a parameter θ – for example, the price of a derivative security. Suppose we can generate by Monte Carlo an i.i.d. sequence {θˆ i , i = 1, 2, . . .}, where each θˆ i has expectation θ and variance σ 2 . A natural estimator of θ based on n replications is then the sample mean n 1 θˆ i . n i=1

By the central limit theorem, for large n this sample mean is approximately normally distributed with mean θ and variance σ 2 /n. Probabilistic error bounds in the form of confidence intervals follow readily from the normal approximation, and √ indicate that the error in the estimator is proportional to σ / n. Thus, decreasing the variance σ 2 by a factor of 10, say, while leaving everything else unchanged, does as much for error reduction as increasing the number of samples by a factor of 100. Suppose, now, that we have a choice between two types of Monte Carlo esti(1) (2) mates which we denote by {θˆ i , i = 1, 2, . . .} and {θˆ i , i = 1, 2, . . .}. Suppose (1) (2) that both are unbiased, so that E[θˆ i ] = E[θˆ i ] = θ, but σ 1 < σ 2 , where

σ 2j = Var[θˆ

( j)

], j = 1, 2. From our previous observations it follows that a (1) sample mean of n replications of θˆ gives a more precise estimate of θ than (2) does a sample mean of n replications of θˆ . But this analysis oversimplifies the comparison because it fails to capture possible differences in the computational (1) effort required by the two estimators. Generating n replications of θˆ may be (2) more time-consuming than generating n replications of θˆ ; smaller variance is not sufficient grounds for preferring one estimator over another. To compare estimators with different computational requirements as well as different variances, we argue as follows. Suppose the work required to generate ( j) one replication of θˆ is a constant b j , j = 1, 2. (In some problems, the work per replication is stochastic; assuming it is constant simplifies the discussion.) With ( j) computing time t, the number of replications of θˆ that can be generated is 8t/b j 9; for simplicity, we drop the 8·9 and treat the ratios t/b j as though they were integers. The two estimators available with computing time t are therefore t/b b1 1 ˆ (1) θ t i=1 i

and

t/b b2 2 ˆ (2) θ . t i=1 i

For large t, these are approximately normally distributed with mean θ and with standard deviations ) ) b1 b2 and σ 2 . σ1 t t

6. Monte Carlo Methods for Security Pricing

189

Thus, for large t, the first estimator should be preferred over the second if σ 21 b1 < σ 22 b2 .

(1)

Equation (1) provides a sound basis for trading-off estimator variance and computational requirements. In light of the discussion leading to (1), it is reasonable to take the product of variance and work per run as a measure of efficiency. Using efficiency as a basis for comparison, the lower-variance estimator should be preferred only if the variance ratio σ 21 /σ 22 is smaller than the work ratio b2 /b1 . By the same argument, a higher-variance estimator may actually be preferable if it takes much less time to generate. In its simplest form, the principle expressed in (1) dates at least to Hammersley and Handscomb (1964, p.22). More recently, the idea has been substantially extended by Glynn and Whitt (1992). They allow the work per run to be random (in which case each b j is the expected work per run) and also consider efficiency in the presence of bias.

2.2 Antithetic variates Equipped with a basis for evaluating potential efficiency improvements, we can now consider specific variance reduction techniques. One of the simplest and most widely used techniques in financial pricing problems is the method of antithetic variates. We introduce it with a simple example, then generalize. Consider the problem of computing the Black–Scholes price of a European call option on a no-dividend stock. Of course, there is no need to evaluate this price by simulation, but the example serves as a useful introduction. In the Black–Scholes model, the stock price follows a lognormal diffusion. Independent replications of the terminal stock price under the risk-neutral measure can be generated from the formula ST(i) = S0 e(r − 2 σ 1

2 )T +σ

√

T Zi

,

i = 1, . . . , n,

(2)

where S0 is the current stock price, r is the riskless interest rate, σ is the stock’s volatility, T is the option’s maturity, and the {Z i } are independent samples from the standard normal distribution. See, e.g., Hull (2000) for background on this model, and see Devroye (1986) for methods of sampling from the normal distribution. Based on n replications, a moment-matched estimator of the price of an option with strike K is given by n n 1 1 Cˆ = Ci ≡ e−r T max{0, ST(i) − K }. n i=1 n i=1

(3)

190

P. Boyle, M. Broadie and P. Glasserman

In this context, the method of antithetic variates4 is based on the observation that if Z i has a standard normal distribution, then so does −Z i . The price S˜ T(i) obtained from (2) with Z i replaced by −Z i is thus a valid sample from the terminal stock price distribution. Similarly, each C˜ i = e−r T max{0, S˜ T(i) − K } is an unbiased estimator of the option price, as is therefore n 1 Ci + C˜ i Cˆ AV = . n i=1 2

A heuristic argument for preferring Cˆ AV notes that the random inputs obtained from the collection of antithetic pairs {(Z i , −Z i )} are more regularly distributed than a collection of 2n independent samples. In particular, the sample mean over the antithetic pairs always equals the population mean of 0, whereas the mean over finitely many independent samples is almost surely different from 0. If the inputs are made more regular, it may be hoped that the outputs are more regular as well. Indeed, a large value of ST(i) resulting from a large Z i will be paired with a small value of S˜ T(i) obtained from −Z i . A more precise argument compares efficiencies. Because Ci and C˜ i have the same variance, 1 Ci + C˜ i = (Var[Ci ] + Cov[Ci , C˜ i ]). Var (4) 2 2 ˆ if Cov[Ci , C˜ i ] ≤ Var[Ci ]. However, Cˆ AV uses Thus, we have Var[Cˆ AV ] ≤ Var[C] ˆ so we must account for differences in computatwice as many replications as C, tional requirements. If generating the Z i takes a negligible fraction of the work per replication (which would typically be the case in the pricing of a more elaborate ˆ option), then the work to generate Cˆ AV is roughly double the work to generate C. Thus, for antithetics to increase efficiency, we require ˆ 2 Var[Cˆ AV ] ≤ Var[C], which, in light of (4), simplifies to the requirement that Cov[Ci , C˜ i ] ≤ 0. That this condition is met is easily demonstrated. Define φ so that Ci = φ(Z i ); φ is the composition of the mappings from Z i to the stock price and from the stock price to the discounted option payoff. As the composition of two increasing functions, φ is monotone, so by a standard inequality (e.g., Section 2.2 of Barlow 4 This method was introduced to option pricing in Boyle (1977), where its use was illustrated in the pricing of

a European call on a dividend-paying stock.

6. Monte Carlo Methods for Security Pricing

191

E[φ(Z i )φ(−Z i )] ≤ E[φ(Z i )]E[φ(−Z i )],

(5)

and Proschan 1975) i.e., Cov[Ci , C˜ i ] ≡ E[φ(Z i )φ(−Z i )] − E[φ(Z i )]E[φ(−Z i )] ≤ 0, and we may conclude that antithetics help. This argument can be adapted to show that the method of antithetic variates increases efficiency in pricing a European put and other options that depend monotonically on inputs (e.g., Asian options). The notable departure from monotonicity in some barrier options (e.g., a down-and-in call) suggests that the use of antithetics in pricing these options may sometimes be less effective. In computing confidence intervals with antithetic variates, it is essential that the standard error be estimated using the sample standard deviation of the n averaged pairs (C i + C˜ i )/2 and not the 2n individual observations C1 , C˜ 1 , . . . , Cn , C˜ n . The averaged pairs are independent but the individual observations are not. This is a case (we will see others shortly) in which the use of a variance reduction technique affects the estimation of the standard error and, in particular, requires some “batching” of observations to deal with dependence. It is worth noting that the method of antithetic variates is by no means restricted to simulations whose only stochastic inputs are standard normal variates. The most primitive stochastic input in most simulations is a sequence {Un } of independent variates uniformly distributed on the unit interval. In this case, 1 − Un has the same distribution as Un , and the pair (Un , 1 − Un ) are called antithetic because they exhibit negative dependence. If the simulation output depends monotonically on the input random numbers, then the output obtained from {1 − U1 , 1 − U2 , . . .} will be negatively correlated with that obtained from {U1 , U2 , . . .}, resulting in increased efficiency compared with independent replications. For further general background on antithetic variates and other methods based on correlation induction, see Bratley, Fox, and Schrage (1987), Hammersley and Handscomb (1964), Glynn and Iglehart (1988), and references there. For some examples of application in finance, see Boyle (1977), Clewlow and Carverhill (1994), and Hull and White (1987). 2.3 Control variates The method of control variates is among the most widely applicable, easiest to use, and effective of the variance reduction techniques.5 Simply put, the principle underlying this technique is “use what you know.” The most straightforward implementation of control variates replaces the evaluation of an unknown expectation with the evaluation of the difference between 5 The earliest application of this technique to option pricing is Boyle (1977).

192

P. Boyle, M. Broadie and P. Glasserman

the unknown quantity and another expectation whose value is known. A specific illustration can be found in the analysis of Boyle and Emanuel (1985) and Kemna and Vorst (1990) of Asian options. Let PA be the price of an option whose payoff depends on the arithmetic average of the underlying asset. Let PG be the price of an option equivalent in every respect except that a geometric average replaces the arithmetic average. Most options based on averages use arithmetic averaging, so PA is of much greater practical value; but whereas PA is analytically intractable, PG can often be evaluated in closed form. Can knowledge of PG be leveraged to compute PA ? It can, through the control variate method. Write PA = E[ PˆA ] and PG = E[ PˆG ], where PˆA and PˆG are the discounted option payoffs for a single simulated path of the underlying asset. Then PA = PG + E[ PˆA − PˆG ]; in other words, PA can be expressed as the known price PG plus the expected difference between PˆA and PˆG . An unbiased estimator of PA is thus provided by PˆAcv = PˆA + (PG − PˆG ).

(6)

This representation6 suggests a slightly different interpretation: PˆAcv adjusts the straightforward estimator PˆA according to the difference between the known value PG and the observed value PˆG . The known error (PG − PˆG ) is used as a control in the estimation of PA . If most of the computational effort goes to generating paths of the underlying asset, then the additional work required to evaluate PˆG along with PˆA is minor. It therefore seems reasonable to compare variances alone. Since Var[ PˆAcv ] = Var[ PˆA ] + Var[ PˆG ] − 2 Cov[ PˆA , PˆG ], this method if effective if the covariance between PˆA and PˆG is large. The numerical results of Kemna and Vorst indicate that this is indeed the case. Fu, Madan, and Wang (1998) have investigated the use of other control variates for Asian options, based on Laplace transform values. These appear to be less strongly correlated with the option price. A closer examination of (6) reveals that this estimator does not make optimal use of the relation between the two option prices. Consider the family of unbiased estimators β Pˆ A = PˆA + β(PG − PˆG ),

(7)

6 To go from (6) to Boyle’s (1977) example, let P be the price of a European call option on a no-dividend G stock and let PA be the corresponding option price in the presence of dividends.

6. Monte Carlo Methods for Security Pricing

193

parameterized by the scalar β. We have β Var[ PˆA ] = Var[ PˆA ] + β 2 Var[ PˆG ] − 2β Cov[ PˆA , PˆG ].

The variance-minimizing β is therefore β∗ =

Cov[ PˆA , PˆG ] . Var[ PˆG ]

Depending on the application, β ∗ may or may not be close to 1, the implicit value in (6). In using an estimator of the form (6), we forgo an opportunity for greater variance reduction. Indeed, whereas (6) may increase or decrease variance, an estimator based on β ∗ is guaranteed not to increase variance, and will result in a strict decrease in variance so long as PˆA and PˆG are not uncorrelated. In practice, of course, we rarely know β ∗ because we rarely know Cov[ PˆA , PˆG ]. However, given n independent replications {(PAi , PGi ), i = 1, . . . , n} of the pairs ( PˆA , PˆG ) we can estimate β ∗ via regression. At this point we face a choice. Using all n replications to compute an estimate βˆ of β ∗ introduces a bias in the estimator n n 1 1 ˆ G− PAi + β(P PGi ), n i=1 n i=1

and its estimated standard error because of the dependence between βˆ and the PGi . Reserving n 1 replications for the estimation of β ∗ and the remaining n − n 1 replications for the sample mean of the PGi (typically with n 1 : n) eliminates the bias but may deteriorate the estimate of β ∗ . Neither issue significantly limits the applicability of the method, because the possible bias vanishes as n increases and because the estimate of β ∗ need not be very precise to achieve a reduction in variance. The advantage of working with (7) over (6) becomes even more pronounced when further controls are introduced. For example, when the asset price is simulated under risk-neutral probabilities, the present value e−r T E[ST ] of the terminal price must equal the current price S0 . We can therefore form the estimator PˆA + β 1 (PG − PˆG ) + β 2 (S0 − e−r T ST ). The variance-minimizing coefficients (β ∗1 , β ∗2 ) are easily found by multiple regression. This optimization step seems particularly crucial in this case; for whereas one might guess that β ∗1 is close to 1, it seems unlikely that β ∗2 would be. Optimizing over the βs also allows us to exploit controls that are negatively correlated with the option payoff. For further general background on control variates see Bratley, Fox, and Schrage (1987), Glynn and Iglehart (1988), and Lavenberger and Welch (1981). For examples of control variate applications in finance, see Boyle (1977), Boyle and

194

P. Boyle, M. Broadie and P. Glasserman

Emanuel (1985), Broadie and Glasserman (1996), Carverhill and Pang (1995), Clewlow and Carverhill (1994), Duan (1995), and Kemna and Vorst (1990). 2.4 Moment matching methods Next we describe a variance reduction technique proposed by Barraquand (1995), who termed it quadratic resampling. His technique is based on moment matching. As before, we introduce it with the simple example of estimating the European call option price on a single asset and then generalize. Let Z i , i = 1, . . . , n, denote independent standard normals used to drive a simulation. The sample moments of the n Z ’s will not exactly match those of the standard normal. The idea of moment matching is to transform the Z ’s to match a finite number of the moments of the underlying population. For example, the first moment of the standard normal can be matched by defining n

Z˜ i = Z i − Z¯ ,

i = 1, . . . , n,

(8)

˜ where Z¯ = i=1 Z i /n is the sample mean of the Z ’s. Note that the Z i ’s are ˜ normally distributed if the Z i ’s are normal. However, the Z i ’s are not independent. As before, terminal stock prices are generated from the formula 1

S˜ T (i) = S0 e(r − 2 σ

2 )T +σ

√

T Z˜ i

,

i = 1, . . . , n.

An unbiased estimator of the call option price is the average of the n values C˜ i = e−r T max( S˜ T (i) − K , 0). In the standard Monte Carlo method, confidence intervals for the true value C could be estimated from the sample mean and variance of estimator. This cannot be done here since the n values of Z˜ are no longer independent, and hence the values C˜ i are not independent. This points out one drawback of the moment matching method: confidence intervals are not as easy to obtain.7 Indeed, for confidence intervals it appears to be necessary to apply moment matching to independent batches of runs and estimate the standard error from the batch means. This reduces the efficacy of the method compared with matching moments across all runs. Equation (8) showed one way to match the first moment of a distribution with mean zero. If the underlying population does not have a zero mean, transformed Z ’s could be generated using Z˜ i = Z i − Z¯ + µ Z , where µ Z is the population mean. The idea can easily be extended to match two moments of a distribution. In this case, an appropriate transformation is σZ + µZ , i = 1, . . . , n, (9) Z˜ i = (Z i − Z¯ ) sZ 7 The point is not merely a minor technical issue. The sample variance of the C˜ ’s is usually a poor estimate of i

Var[C˜ i ].

6. Monte Carlo Methods for Security Pricing

195

where s Z is the sample standard deviation of the Z i ’s and σ Z is the population standard deviation. Of course, for a standard normal, µ Z = 0 and σ Z = 1. An estimator of the call option price is the average of the n values C˜ i . Using the transformation (9), the Z˜ i ’s are not normally distributed even if the Z i ’s are normal. Hence, the corresponding C˜ i are biased estimators of the true option value. For most financial problems of practical interest, this bias is likely to be small. However, the bias can be arbitrarily large in extreme circumstances (even when only the first moment of the distribution is matched).8 The dependence and bias in the moment matching method makes it difficult to quantify the improvement in general analytical terms. The moment matching method is another example of the idea to “use what you know.” In this simple European option example, the mean and variance of the terminal stock price ST is also known. So the moment matching idea could be applied to the simulated terminal stock values ST (i). In this case, to match the first moment, define S˜ T (i) = ST (i) − S¯ T + µ S , (10) T

where µ ST = S0 e two moments, define rT

and S¯ T is the sample mean of the ST (i)’s. To match the first σS S˜ T (i) = (ST (i) − S¯ T ) T + µ ST , s ST

(11)

where σ ST = S0 e2r T (eσ 2 T − 1) and s ST is the sample standard deviation of the ST (i)’s. Duan and Simonato (1998) use a related method. They apply a multiplicative transformation to asset prices to enforce the martingale property over a finite set of paths.9 They apply their method to GARCH option pricing. Comparisons of various moment matching strategies are given in Table 1. For this comparison, n = 100 simulation trials were used to estimate the European call option price. Standard errors were estimated by re-simulation. That is, m = 10 000 simulation trials were conducted, each one based on n replications of the estimator. The sample standard deviation of the m simulation estimates gives an estimate of the standard error of a single simulation estimate. Root-mean-squared errors are not reported because they are identical to the standard errors for the number of digits reported. 8 For example, let Z take the values +1 or −1 with probability one-half. Consider a security which pays +$1 if

Z = 1 and −$x if Z = 1. The expected payoff of the security is (1 − x)/2. To estimate this expected payoff by Monte Carlo simulation, draw n samples Z i according to the prescribed distribution. Then use equation (8) to define Z˜ i ’s which match the first moment. For almost all samples for any large n, the estimated expected payoff is −x and the bias is (1 + x)/2. This bias does not decrease as n increases. Care must be taken when using equation (8) or (9) when the support of the random variable of not the entire real line. For example, applying (8) or (9) to uniform or exponential random variables could cause the transformed values to fall outside of the relevant domain. 9 This is equivalent to enforcing put-call parity.

196

P. Boyle, M. Broadie and P. Glasserman

Table 1. Standard errors for European call options. S0 /K

No variance reduction

MM1 Equation (8)

MM2 Equation (9)

MM1 Equation (10)

MM2 Equation (11)

0.2

0.9 1.0 1.1

0.24 0.62 0.93

0.19 0.29 0.19

0.11 0.09 0.09

0.19 0.26 0.15

0.09 0.10 0.11

0.4

0.9 1.0 1.1 0.9 1.0 1.1

0.80 1.22 1.61 1.40 1.93 2.38

0.55 0.66 0.63 0.95 1.10 1.13

0.24 0.19 0.17 0.38 0.31 0.25

0.51 0.56 0.48 0.84 0.91 0.85

0.17 0.23 0.28 0.28 0.39 0.49

σ

0.6

All results are based on n = 100 simulation trials. The option parameters are: K = 100, r = 0.10, T = 0.2, with S0 and σ varying as indicated. Standard error estimates are based on m = 10 000 simulations.

The results in Table 1 show that matching two moments can reduce the simulation error by a factor ranging from 2 to 10. Matching two moments dominates matching one moment, but there is not a clear choice between transforming the original standard normals using (9) or the terminal stock prices using (11). Further computational results, not included in Table 1, indicate that the improvement factor with moment matching is essentially constant as n increases. This may seem counterintuitive, since the moment matching adjustments converge to zero as n increases. But the progressively smaller adjustments are equally important in reducing the estimation error as the number of simulation trials increases. For example, the standard error for n = 10 000 simulation trials is one-tenth of the corresponding number for n = 100 reported in Table 1. The moment matching method can be extended to match covariances. For options that depend on multiple assets, the entire covariance structure is typically a simulation input. Barraquand (1995) suggests a method to match the entire covariance structure and reports error reduction factors ranging from two to several hundred for this method applied to pricing options on the maximum of k assets. The moment matching procedure could be applied to matching higher order moments as well. In addition to different methods for transforming random outcomes to match specified moments, additional points could be added as another way to match moments. Whenever a moment is known, it can be used as a control rather than for moment matching. In an appendix, we give a theoretical argument favoring the use of moments as controls rather than for matching.

6. Monte Carlo Methods for Security Pricing

197

2.5 Stratified and Latin hypercube sampling Like many variance reduction techniques, stratified sampling seeks to make the inputs to simulation more regular than random inputs. In particular, it forces certain empirical probabilities to match theoretical probabilities, just as moment matching forces empirical moments to match theoretical moments. Consider, for example, the generation of 100 normal random variates as inputs to a simulation. The empirical distribution of an independent sample Z 1 , . . . , Z 100 will look only roughly like the normal density; the tails of the distribution – often the most important part – will inevitably be underrepresented. Stratified sampling can be used to force exactly one observation to lie between the (i − 1)th and i th percentile, i = 1, . . . , 100, and thus produce a better match to the normal distribution. One way to implement this generates 100 independent random variates U1 , . . . , U100 , uniform on [0, 1] and set Z˜ i = N −1 ((i + Ui − 1)/100), i = 1, . . . , 100, where N −1 is the inverse of the cumulative normal distribution. This works because (i + Ui − 1)/100 falls between the (i − 1)th and i th percentiles of the uniform distribution, and percentiles are preserved by the inverse transform. Of course, Z˜ 1 , . . . , Z˜ 100 are highly dependent, complicating the estimation of standard errors. Computing confidence intervals with stratified sampling typically requires batching the runs. For example, with a budget of 100 000 replications we might run 100 independent stratified samples each of size 1000, rather than a single stratified sample of size 100 000. To estimate standard errors we must therefore sacrifice some variance reduction, just as with moment matching. In principle, this approach applies in arbitrary dimensions. To generate a stratified sample from the d-dimensional unit hypercube, with n strata in each coordi(d) nate, we could generate a sequence of vectors U j = (U (1) j , . . . , U j ), j = 1, 2, . . ., and then set U j + (i 1 , . . . , i d ) Vj = , i k = 0, . . . , n − 1, k = 1, . . . , d. n Exactly one V j will lie in each of the n d cubes defined by the product of the n strata in each coordinate. The difficulty in high dimensions is that generating even a single stratified sample of size n d may be prohibitive unless n is very small. Latin hypercube sampling can be viewed as a way of randomly sampling n points of a stratified sample while preserving some of the regularity from stratification. The method was introduced by McKay, Conover, and Beckman (1979) and further analyzed in Stein (1987). It works as follows. Let π 1 , . . . , π d be independent random permutations of {1, . . . , n}, each uniformly distributed over all n! possible permutations. Set V j(k)

=

U (k) j + π k ( j) − 1 n

,

k = 1, . . . , d,

j = 1, . . . , n.

198

P. Boyle, M. Broadie and P. Glasserman

The randomization ensures that each vector V j is uniformly distributed over the d-dimensional hypercube. At the same time, the coordinates are perfectly stratified in the sense that exactly one of V1(k) , . . . , Vn(k) falls between ( j −1)/n and j/n, j = 1, . . . , n, for each dimension k = 1, . . . , d. As before, the dependence introduced by this method implies that standard errors can be estimated only through batching. These methods can be viewed as part of a hierarchy of methods introducing additional levels of regularity in inputs at the expense of complicating the estimation of errors. Some, like stratified sampling, fix the size of the sample while others leave flexibility. The extremes of this hierarchy are straightforward Monte Carlo (completely random) and the low-discrepancy methods (completely deterministic) discussed in Section 3. Owen (1995a, 1995b) discusses these and other methods and introduces a hybrid that combines the regularity of low-discrepancy methods with the simple error estimation of standard Monte Carlo. Shaw (1995) uses an extension proposed by Stein (1987) to handle dependent inputs in a novel approach to estimating value at risk.

2.6 Some numerical comparisons The variance reduction methods discussed thus far are fairly generic, in the sense that they do not rely on the detailed structure of the security to be priced. This contrasts with the remaining two methods that we discuss – importance sampling and conditional Monte Carlo. These methods must be carefully tailored to each application. It therefore seems appropriate to digress briefly into a numerical comparison of the generic methods on some option pricing problems. We first examine the performance of these methods in pricing Asian options. The payoff of a discretely sampled arithmetic average Asian option is max( S¯ − k Si /k, Si is the asset price at time ti = i T /k, and T is the K , 0), where S¯ = i=1 option maturity. The value of the option is E[e−r T max( S¯ − K , 0)]. There is no easily evaluated closed-form expression for this option value. Various formulas to approximate the Asian option price have been developed, but simulation is usually used to test the accuracy of the approximations. For this Asian option, k random numbers are needed to simulate one option payoff, and nk random numbers are needed in total. Moment matching (MM2, for two moments) was applied k times to the n numbers used to generate each Si at time ti . Latin hypercube sampling (LHS) was applied to sample n points from the k-dimensional unit cube. The discretely sampled geometric average Asian price was used as a control variate (see Turnbull and Wakeman 1991 for a closed-form solution for this price). Results appear in Table 2. The results in Table 2 indicate that matching two moments can reduce the simulation error by a factor ranging from 1 to 10. Using the geometric average Asian

6. Monte Carlo Methods for Security Pricing

199

Table 2. Standard errors for arithmetic average Asian options. K /S0

No variance reduction

Antithetic method

Control variate

MM2

LHS

0.2

0.9 1.0 1.1

0.053 0.344 0.566

0.052 0.231 0.068

0.003 0.004 0.006

0.048 0.162 0.052

0.049 0.161 0.058

0.4

0.9 1.0 1.1 0.9 1.0 1.1

0.308 0.694 1.017 0.632 1.052 1.443

0.297 0.506 0.388 0.583 0.817 0.759

0.014 0.017 0.021 0.032 0.038 0.047

0.240 0.352 0.281 0.451 0.566 0.539

0.248 0.354 0.289 0.455 0.578 0.560

σ

0.6

All results are based on n = 100 simulation trials with k = 50 prices in the average. The option parameters are: K = 100, r = 0.10, T = 0.2, with S0 and σ varying as indicated. Standard error estimates based on m = 10 000 simulations. The geometric average Asian option is used as the control variate. Moment matching (MM2) was applied to the i th price in the average, i = 1, . . . , 5, across replications.

option price as a control variate reduces error by a factor ranging from 20 to 100, and is consistently the most effective method. LHS and MM2 perform similarly. Antithetics are consistently dominated by the other methods. Next we compare these variance reduction techniques in pricing down-and-out call options with discrete barriers. The payoff of this option at expiration is the standard call option payoff if the asset price Si exceeds the barrier H at all times ti = i T /k, i = 1, . . . , k, otherwise the payoff is zero. The option is knocked out if Si ≤ H at any time ti . As a control we use the Black–Scholes price of a standard call. Moment matching and LHS are implemented as with the Asian option. Results are given in Table 3. These are consistent with the pattern in Table 2, except that the superiority of the control variate method is less pronounced. Although it is always risky to draw conclusions from limited numerical evidence, we suggest the following broad conclusions. The antithetic method is easy to implement, but often leads to only modest error reductions. Moment matching is similarly easy to implement and often leads to significant error reductions, but the error estimation is more difficult and bias is a potential problem. LHS suffers from the same error estimation difficulty but does not introduce bias. The control variate technique can lead to very substantial error reductions, but its effectiveness hinges on finding a good control for each problem.

200

P. Boyle, M. Broadie and P. Glasserman

Table 3. Standard errors for down-and-out call options with discrete barriers. K /S0

No variance reduction

Antithetic method

Control variate

MM2

LHS

0.2

0.9 1.0 1.1

0.96 0.62 0.30

0.44 0.44 0.28

0.37 0.13 0.03

0.43 0.31 0.22

0.39 0.30 0.22

0.4

0.9 1.0 1.1 0.9 1.0 1.1

1.59 1.22 0.88 2.19 1.86 1.54

1.15 1.00 0.82 1.83 1.62 1.40

0.73 0.45 0.26 1.07 0.80 0.58

0.95 0.76 0.61 1.44 1.25 1.09

0.88 0.74 0.61 1.36 1.23 1.09

σ

0.6

All results are based on n = 100 simulation trials. There are k = 5 points in the discrete barrier at 95. The other option parameters are: S0 = 100, r = 0.10, T = 0.2, with K and σ varying as indicated. Standard error estimates are based on m = 10 000 simulations. The standard European call option (Black–Scholes formula) is used as the control variate. Moment matching (MM2) was applied to the i th return, i = 1, . . . , 5, across replications.

2.7 Importance sampling This technique builds on the observation that an expectation under one probability measure can be expressed as an expectation under another through the use of a likelihood ratio or Radon–Nikodym derivative. This idea is familiar in finance because it underlies the representation of prices as expectations under a martingale measure. In Monte Carlo, the change of measure is used to try to obtain a more efficient estimator. We present some examples using this technique; for general background see Bratley et al. (1987) or Hammersley and Handscomb (1964). As a simple example, consider the evaluation of the Black–Scholes price of a call option – i.e., the computation of e−r T E[max{ST − K , 0}] with ST as in (2). A straightforward approach generates samples of the terminal value ST consistent with a geometric Brownian motion having drift r and volatility σ , just as in (2). But we are in fact free to generate ST consistent with any other drift µ, provided we weight the result with a likelihood ratio. For emphasis, we subscript the expectation operator with the drift parameter. Then E r [max{ST − K , 0}] = E µ [max{ST − K , 0}L], where the likelihood ratio L is the ratio of the lognormal densities with parameters

6. Monte Carlo Methods for Security Pricing

201

r and µ evaluated at ST , given by L=

ST S0

r −µ 2 σ

(µ2 − r 2 )T exp . 2σ 2

Indeed, ST need not even be sampled from a lognormal distribution. The only requirement is that the support of the importance sampling measure contain the support of the original measure so that the likelihood ratio is well-defined; this is an absolute continuity requirement. In the example above, this means that any distribution for ST whose support includes (0, ∞) is admissible. Ideally, one would like to choose the importance sampling distribution to reduce variance. In the example above, one obtains a zero-variance estimator by sampling ST from the density f (x) = c−1 max{x − K , 0}e−r T g(x), where g is the (lognormal) density of ST and c is a normalizing constant that makes f integrate to 1. The difficulty is that c is the Black–Scholes price itself, so this method requires knowledge of the solution for its implementation. Nevertheless, it gives some indication of the potential gain from importance sampling. Reider (1993) has investigated the impact of importance sampling based on a change of drift and volatility. (Changing the volatility is consistent with absolute continuity in a discrete-time approximation of a diffusion though not in the continuous-time limit.) He finds that choosing the importance sampling distribution to have higher drift and volatility provides substantial variance reduction in pricing deep out-of-the-money options. He also investigates the combination of importance sampling with antithetic variates and control variates, and the use of put-call parity for indirect estimation. Nielsen (1994) has explored some related importance sampling ideas in sampling from a binomial tree. Andersen (1995) has developed a powerful application of importance sampling for simulating interest rates and has applied it to nonlinear stochastic differential equation models. We briefly describe his approach. Let rt be the instantaneous short rate described, e.g., by a diffusion model. Then B(T ) = E exp −

!

T

rt dt

0

is the price today of a zero-coupon bond with face value $1, maturing at time T . In, for example, the Cox–Ingersoll–Ross and Vasicek models,10 B(T ) is available 10 See, e.g., Hull (1993, Chapter 15) for background on these models.

202

P. Boyle, M. Broadie and P. Glasserman

in closed form. We may therefore define a new probability measure P¯ by setting T ! ¯ P(A) = E exp − rt dt − log B(T ) 1 A 0

for any event A, where 1 A denotes the indicator of the event A. Let E¯ denote ¯ Then for any random variable X , E[X ] = E[X ¯ LT ] expectation with respect to P. where the likelihood ratio L T is given by T rt dt + log B(T ) . L T = exp 0

T In particular, if we take X = exp(− 0 rt dt), we know that E[X ] = B(T ) and therefore B(T ) is the expectation under E¯ of X L T ; i.e., of T T exp − rt dt × exp rt dt + log B(T ) . 0

0

But this simplifies to B(T ) itself, meaning that we obtain a zero-variance estimator of the bond price by switching to the new probability measure. Moreover, Andersen shows that sample paths of rt can be generated under P¯ simply by applying a change of drift to the original process. As described above, the method would appear to require knowledge of the solution for its implementation. Nevertheless, the method has two important applications. The first is in the pricing of contingent claims. Because P¯ eliminates the variance of bond prices, it should be effective in reducing variance for pricing, e.g., European bond options expiring at time T . Andersen’s numerical results bear this out. A second application is in the pricing of bond models with no closed-form solutions: Andersen’s results show that the change of drift derived from a tractable model (like CIR or Vasicek) remains effective when applied to an intractable model, and this significantly expands the scope of the method. Importance sampling is frequently used to make rare events less rare; this is already suggested in Reider’s (1994) application to out-of-the-money options. Our next example further highlights this aspect through a new application to barrier options. We consider a knock-in option far from the barrier and use importance sampling to increase the probability of a payout. Suppose the barrier is monitored at discrete times nt, n = 0, 1, . . . , m, with T = T /m. Set the barrier at H = S0 e−b and the strike at K = S0 ec , with b, c > 0. A down-and-in call pays ST − K at time T if ST > K and Snt < H for some n = 1, . . . , m. We can write the price of the underlying at monitoring instants as n Xi , Snt = S0 eUn , Un = i=1

6. Monte Carlo Methods for Security Pricing

203

with the X i i.i.d. normal having mean (r − 12 σ 2 )t and variance σ 2 t. Let τ be the first time Un drops below −b; then the probability of a payout is P(τ < m, Um > c). If b and c are large, this probability is small, and most simulation runs return zero. Through importance sampling, we can increase this probability and thus get more information out of each run. Consider alternative probability measures Pµ1 ,µ2 that give Un a drift of µ1 t until τ and then switch the drift to µ2 t. Intuitively, we would like to make µ1 < 0 to drive the asset price to the barrier and then make µ2 > 0 to drive it above the strike. For any µ1 , µ2 , we have P(τ < m, Um > c) = E µ1 ,µ2 [L µ1 ,µ2 1{τ <m,Um >c} ]. The likelihood ratio is given by L µ1 ,µ2 = exp(−θ 1Uτ + ψ(θ 1 )τ − θ 2 (Um − Uτ ) + ψ(θ 2 )(m − τ )), where θ i = (µi − r + 12 σ 2 )/σ 2 , i = 1, 2, and ψ(θ) = (r − 12 σ 2 )tθ + 12 σ 2 tθ 2 . This follows from algebraic simplification of the product of the ratios of the densities of the X i under the original and new means. It remains to choose µ1 , µ2 . Intuitively, most of the variability in L µ1 ,µ2 comes from τ (the time of the barrier crossing): for large b, c, in the event of a payout we expect to have Uτ ≈ −b and Um ≈ c so these terms should contribute less variability. If we choose µ1 , µ2 so that ψ(θ 1 ) = ψ(θ 2 ), the likelihood ratio simplifies to L µ1 ,µ2 = exp(−(θ 1 − θ 2 )Uτ − θ 2Um + mψ(θ 2 )), which depends on τ only through Uτ ≈ −b. The condition ψ(θ 1 ) = ψ(θ 2 ) translates to µ1 = −µ2 ≡ −µ, so it only remains to choose this drift parameter. We choose it so that the time to traverse the straight line path from 0 to −b and then to c at rate µ equals the number of steps m: b (b + c) + = m; µt µt i.e., µ = (2b + c)/T . Interestingly, this change of drift does not depend on the original mean increment (r − 12 σ 2 )t. Table 4 illustrates the performance of this method. The computational effort with and without importance sampling is essentially the same, so the efficiency improvement is just the ratio of the variances. The improvement varies widely but shows the potential for dramatic gains from importance sampling, particularly when the barrier is far from the current price of the underlying.11 11 The standard errors in the table are all quite small, but so are the associated option values. Hence, the relative

error without importance sampling is quite significant.

204

P. Boyle, M. Broadie and P. Glasserman

Table 4. Standard errors for down-and-in calls: importance sampling. H

K

No variance reduction

Importance sampling

Efficiency ratio

92 92 88 85

100 105 96 90

0.003 09 0.001 29 0.001 10 0.000 84

0.000 69 0.000 14 0.000 11 0.000 08

20 85 96 116

92 85 75 75

105 105 96 85

0.014 18 0.003 28 0.000 30 0.001 48

0.005 41 0.000 38 0.000 01 0.000 10

7 75 1124 222

All results are based on n = 100 000 simulation trials. The parameters are: S0 = 95, σ = 0.15, and r = 0.05, with the barrier H and strike K varying as indicated. The first four cases have T = 0.25 and m = 50; the last four have T = 1 and m = 250.

In recent work, Andersen and Brotherton-Ratcliffe (1996) and Beaglehole, Dybvig, Zhou (1997) show how to eliminate the bias caused by using a simulation at a discrete set of times to price continuous options on extrema, e.g., barrier or lookback options.

2.8 Conditional Monte Carlo This approach to efficiency improvement exploits the variance reducing property of conditional expectation: for any random variables X and Y , Var[E[X |Y ]] ≤ Var[X ], with strict inequality except in trivial cases.12 In replacing an estimator by its conditional expectation we reduce variance essentially because we are doing part of the integration analytically and leaving less to be done by Monte Carlo. Hull and White (1987) use this idea to price options with stochastic volatilities. Consider a model in which an asset price and its volatility evolve as follows: d S = r S dt + ν S dW1 dν 2 = αν 2 dt + ξ ν 2 dW2 , with W1 , W2 independent. Suppose we want to price a standard European call on S. A straightforward approach simulates sample paths of ν and S up to time T and averages max{ST − K , 0} over all paths. An alternative notes that, conditional on the path of ν t in [0, T ], the asset price St may be treated as having a time-varying 12 This is a direct consequence of Jensen’s inequality for conditional expectations.

6. Monte Carlo Methods for Security Pricing

205

but deterministic volatility. Thus, conditional on the volatility path, the option can be priced by the Black–Scholes formula: e−r T E[max{ST − K , 0}|ν t , 0 ≤ t ≤ T ] = BS(S0 , K , r, T, VT ), where VT =

1 T

T

ν 2t dt

0

is the average squared volatility over the path, and BS(S, K , T, r, σ ) is the Black– Scholes price of a call with constant volatility σ and the other parameters as indicated. Using this conditional expectation as the estimator is sure to reduce variance and may even reduce computational effort since it obviates simulation of S. It is worth emphasizing that both straightforward Monte Carlo and conditional Monte Carlo would have to be applied to discrete-time approximations of the continuous processes above. Also, the applicability of conditional Monte Carlo in this setting relies on the fact that the evolution of the asset price does not influence the volatility path. See Willard (1997) for an extension to the case of correlated W1 and W2 . As a further illustration of the use of conditional Monte Carlo, we give a new illustration in the pricing of a down-and-in call with a discretely monitored barrier. Let 0 = t0 < t1 < · · · < tm = T be the monitoring instants and Sti the price of the underlying at the i th such instant. The option price is E[e−r T max{ST − K , 0}1{τ H ≤T } ], where H is the barrier and τ H is the first monitoring time at which the barrier is breached. Straightforward simulation generates paths of the underlying and evaluates the estimator e−r T max{ST − K , 0}1{τ H ≤T } . Our first alternative conditions on {S0 , . . . , Sτ H }, the path of the underlying until the barrier crossing; i.e., E[e−r T max{ST − K , 0}1{τ H ≤T } ] = e−r T E[E[max{ST − K , 0}1{τ H ≤T } |S0 , . . . , Sτ H ]] = e−r T E[BS(Sτ H , K , r, T − τ H , σ )1{τ H ≤T } ]. This yields the estimator CMC1 = e−r T BS(Sτ H , K , r, T − τ H , σ )1{τ H ≤T } This says: simulate until the barrier is crossed or the option expires; if the barrier was crossed, return the Black–Scholes price starting from price Sτ H with maturity T − τ H.

206

P. Boyle, M. Broadie and P. Glasserman

Our second alternative conditions one step earlier, at each monitoring instant evaluating the probability that the barrier will be breached for the first time at the next monitoring instant: ! m 1{τ H =tn } E[e−r T max(ST − K , 0)1{τ H ≤T } ] = e−r T E max{ST − K , 0} n=1

=e

−r T

E

m

E[max{ST − K , 0}1{τ H =tn } |St0 , . . . , Stn−1 ]

n=1

=e

−r T

E

!

τ H −1

! BS2(Stn , K , H, r, tn+1 − tn , T − tn , σ )

n=0

where BS2(S, K , H, r, t, T, σ ) is the price of a down-and-in call that knocks in only if the underlying is below H at time t. We thus arrive at the estimator CMC2 = e−r T

τ H −1

BS2(Stn , K , H, r, tn+1 − tn , T − tn , σ ),

n=0

with BS2(S, K , H, r, t, T, σ ) = S N2 (a1 , b1 , ρ) − e−r T K N2 (a2 , b2 , ρ) √ where ρ = − t/T , N2 is the bivariate cumulative normal distribution with correlation ρ, and a1 =

log(S/K ) + (r + 12 σ 2 )T , √ σ T

√ a 2 = a1 − σ T

b1 =

log(H/S) − (r + 12 σ 2 )t , √ σ t

√ b2 = b1 + σ t.

(The derivation of this formula is fairly standard and therefore omitted.) The CMC2 estimator can be expected to have lower variance than the CMC1 estimator because it conditions on less information and thus does more integration analytically. In fact, CMC2 is not a conditional Monte Carlo estimator in the strict sense because it conditions on different information at different times, making it more precisely a filtered Monte Carlo estimator in the sense of Glasserman (1996). Because the two estimators above have the same expectation, their difference has mean 0 and can be used as a control variate to form a further estimator CMC = CMC1 + β(CMC2 − CMC1 ). With β optimized, this has lower variance than either individual estimator. Numerical results appear in Table 5. As expected, each level of conditioning further reduces variance, and the combined estimator achieves the lowest standard

6. Monte Carlo Methods for Security Pricing

207

Table 5. Comparison of CMC estimators for down-and-in call. Method

Standard Error (s)

Computation Time (t)

√ s t

Base CMC1 CMC2 CMC

0.108 0.034 0.021 0.014

0.133 0.117 3.233 3.367

0.039 0.012 0.038 0.026

Results based on n = 10 000 replications with σ = 0.4, r = 0.10, S0 = K = 100, H = 95, T = 0.5, and 10 equally spaced monitoring times.

error of all. However, repeated evaluation of the function BS2 turns out to be time-consuming, making CMC1 overall the most efficient estimator.

3 Low-discrepancy sequences For complex problems the performance of the basic Monte Carlo approach may be √ rather unsatisfactory because the error is O(1/ n). We can sometimes improve convergence by using pre-selected deterministic points to evaluate the integral. The accuracy of this approach depends on the extent to which these deterministic points are evenly dispersed throughout the domain of integration. Discrepancy measures the extent to which the points are evenly dispersed throughout a region: the more evenly dispersed the points are the lower the discrepancy. Low-discrepancy sequences are often called quasi-random sequences even though they are not at all random.13 We shall use both terms in this paper. Low-discrepancy methods have recently been used to tackle a number of problems in finance. These applications are more fully described in papers by Birge (1994), Joy, Boyle, and Tan (1996) and Paskov and Traub (1995); the use of quasi-Monte Carlo is also proposed in Cheyette (1992). In this section we describe how the approach works and review some of the recent applications. The book by Press et al. (1992) provides an intuitive introduction to low-discrepancy sequences and quasi-Monte Carlo methods. Spanier and Maize (1994) provide a recent overview of quasi-random methods and how they can be used to evaluate integrals with medium sized samples. Niederreiter (1992) and Tezuka (1995) provide in-depth analyses of low-discrepancy sequences. Moskowitz and Caflisch (1996) discuss recent developments in improving the convergence of quasi-random Monte Carlo methods. In earlier work, Haselgrove (1961) describes a method for multi13 Thus the name quasi-random is very misleading since these sequences are deterministic. However, it seems

to be sanctioned by usage.

208

P. Boyle, M. Broadie and P. Glasserman

variate integration that can be applied to security pricing. Haselgrove’s method is developed for problems of eight dimensions or less and our numerical experiments suggest that it is competitive with the low-discrepancy sequences investigated in this section for problems of this size. The basic idea behind the approach is quite intuitive and is readily explained in the one-dimensional case. Suppose we wish to integrate a function f (x) over the interval [0, 1] using a sequence of n points. Rather than pick a random sequence suppose we pick a deterministic sequence of points that are, in some sense, evenly distributed. With this choice, the accuracy of the estimate will be higher than that obtained using the crude Monte Carlo approach. If we use an equally spaced grid we obtain the trapezoidal method of numerical integration which has an error of O(n −1 ). However, the more challenging task is to evaluate multi-dimensional integrals. Without loss of generality we can assume that the domain of integration is contained in the d-dimensional unit hypercube. The advantages of the uniformly spaced grid in the one-dimensional case do not carry over to higher dimensions. The principal reason is that the error bound for the d-dimensional trapezoidal rule is O(n −2/d ). In addition, if we use an evenly spaced Cartesian grid, we would have to decide the number of points in advance to achieve uniformity. This is restrictive because, in numerical applications, we would like to be able to add points sequentially until some termination criterion is met. Low-discrepancy sequences have the property that as successive points are added the entire sequence of points still remains more or less evenly dispersed throughout the region. Niederreiter (1992) gives a detailed analysis of the discrepancy of a sequence. Here, we just briefly recall the definition. Suppose we have a sequence of n points {x 1 , x2 , . . . , x n } in the d-dimensional half-open unit cube, I d = [0, 1)d and a subset J of I d . We define D(J ; n) =

A(J ; n) − V (J ), n

where A(J ; n) is the number of k, 1 ≤ k ≤ n, with xk ∈ J and V (J ) is the volume of J . The discrepancy, Dn , of the sequence is defined to be the supremum of |D(J ; n)| over all J . The star discrepancy Dn∗ , is obtained by taking the supremum over sets J of the form d 0

[0, u i ).

i=1

In the one-dimensional case there is a simple explicit form for the (star)14 discrepancy of a sequence of n points. If we label the points so that, 0 ≤ x 1 ≤ · · · ≤ 14 For the rest of the paper we simply use the term discrepancy rather than star discrepancy to refer to D ∗ . n

6. Monte Carlo Methods for Security Pricing

209

xn ≤ 1, then the discrepancy of this sequence is 1 2k − 1 ∗ + max xk − . Dn = 2n k=1,...,n 2n We can see that the star discrepancy is at least 1/(2n) and that the lowest value is attained when 2k − 1 , 1 ≤ k ≤ n. xk = 2n In higher dimensions there is no simple form for the discrepancy of a sequence. There are several examples of low-discrepancy sequences, including the sequences proposed by Halton (1960), Sobol’ (1967), Faure (1982), and Niederreiter (1988).15 For these sequences the asymptotic form of the star discrepancy has been shown to be (log n)d ∗ . Dn = O n This bound for the discrepancy involves a constant which in general depends on the dimension d of the sequence. These constants are very difficult to estimate accurately in high dimensions. For large values of d the constants “are often ridiculously large for reasonable values of n” according to Spanier and Maize (1994, p. 23). Furthermore for high dimensions it may take a long time before the discrepancy reaches its asymptotic level. Morokoff and Caflisch (1995) note √ that for intermediate values of n the discrepancy may be O( n). They suggest that the transition to O(n −1 (log n)d ) occurs at around values of n = ed . For large d this will be an enormous number. The error in numerical integration using a low-discrepancy sequence admits a deterministic bound. The bound reflects both the discrepancy of the sequence of points used to evaluate the integral as well as the regularity of the function. The result is contained in the following theorem. Theorem (Koksma–Hlawka) Let I d = [0, 1)d and let f have bounded variation V ( f ) on [0, 1]d in the Hardy–Krause16 sense. Then for any x1 , x 2 , . . . , xn ∈ I d we have n 1 f (x k ) − f (u) du ≤ V ( f )Dn∗ . n Id k=1 15 Interestingly, linear congruential generators – frequently used to generate the pseudo-random numbers that

drive ordinary Monte Carlo – produce sets of points with low-discrepancy over the entire period of the generator; see Niederreiter (1976). This suggests the possibility of choosing such a generator with period roughly equal to the total number of points required as a type of quasi-Monte Carlo method. In ordinary Monte Carlo, one prefers instead that the period be many orders of magnitude larger than the number of points required. We thank Peter Hellekalek of the University of Salzburg for this observation. 16 For a more complete discussion of the Hardy–Krause definition of variation and details on this theorem see Niederreiter (1992).

210

P. Boyle, M. Broadie and P. Glasserman

The error bound provided by this theorem, while it is of theoretical interest, is of little help in most practical situations. The theoretical bound normally overestimates the actual error by a wide margin and V ( f ) may be difficult to evaluate or even approximate. We have noted that the constants buried in the bounds for the discrepancy are large. Another reason for the coarseness of the bound is that the Koksma–Hlawka theorem does not reflect additional smoothness in f . Intuitively we would expect the approximation to be better as f becomes smoother. In finance applications the payoffs are normally continuous functions of the variables (with some important exceptions – payoffs on digital and barrier options are discontinuous), but may not be sufficiently smooth to have finite variation because of functions like “max” embedded in the payoffs. Hlawka (1971) provides an alternative bound under weaker smoothness requirements. To date, studies using low-discrepancy sequences in finance applications find that the errors produced are substantially lower than the corresponding errors generated by crude Monte Carlo. Joy, Boyle, and Tan (1996) used Faure sequences to price several complex derivative securities. They found that the quasi-Monte Carlo approach resulted in significantly smaller errors than the standard Monte Carlo approach. They confirmed that the actual error bound (for cases in which it could be computed precisely) was dramatically less than the bound computed from the Koksma–Hlawka inequality. Paskov and Traub (1995) used both Sobol’ sequences and Halton sequences to evaluate mortgage-backed security prices. Their work involves the evaluation of integrals with dimensions up to 360; they find that Sobol’ sequences are more efficient than Halton sequences and that the quasi-random approach outperforms the standard Monte Carlo approach for these types of problems.17 Paskov and Traub’s results stand in contrast to the claim that is sometimes found in the literature18 that the superiority of low-discrepancy algorithms vanishes for intermediate values of d around 30. Bratley, Fox, and Niederreiter (1992) conducted practical numerical experiments using low-discrepancy sequences and conclude that standard Monte Carlo is superior to quasi-Monte Carlo for high dimensions, say greater than 12. They used Sobol’ and Niederreiter sequences in their tests. They conclude that in high dimensions, “quasi-Monte Carlo seems to offer no practical advantage over pseudo-Monte Carlo because the discrepancy √ bound for the former is far larger than n for n = 230 , say.” (In a personal communication, Fox adds that the crossover probably depends a lot on the sequence.) The reason for the difference between this verdict and the results of the finance applications may be that the integrands typically found in finance applica17 Bratley et al. (1992) note that the Niederreiter sequence they tested theoretically beats Sobol’ sequences in

dimensions higher than seven. 18 See, for example, Rensburg and Torrie (1993) or Morokoff and Caflisch (1995).

6. Monte Carlo Methods for Security Pricing

211

tions behave better than those used by numerical analysts19 to compare different algorithms. Another important consideration is that financial applications typically involve discounting, and this may effectively reduce dimensionality; for example, some of the 360 months in the life of a mortgage may have little influence on the value of a mortgage-backed security. Nevertheless, the experience of Bratley et al. (1992) serves as a useful caution against assuming that quasi-Monte Carlo will outperform standard Monte Carlo in all situations. Some theoretical differences among low-discrepancy sequences can be understood through the concepts of (t, m, s)-nets and (t, s)-sequences; these are discussed in detail in Niederreiter (1992). Briefly, an elementary interval in base b in dimension s is a set of the form s 0 aj aj + 1 , , bk j bk j j=1 with k j , a j nonnegative integers and a j < bk j . A (t, m, s)-net (with 0 ≤ t ≤ m) is a set of bm points in the s-dimensional hypercube such that every elementary interval of volume bt−m contains bt points. Speaking loosely, this means that the proportion of points in each sufficiently large box equals the volume of the box. Smaller t implies greater uniformity. An infinite sequence forms a (t, s)-sequence if for all m ≥ t certain finite subsequences of length bm form (t, m, s)-nets in base b. Sobol’ points are (t, s)-sequences in base 2 and Faure points are (0, s) sequences in prime bases not less than s. Thus, Faure points achieve the smallest value of t, but at the expense of a large base. A smaller base implies that uniformity holds over shorter subsequences. An important issue in the use of quasi-Monte Carlo concerns the termination criterion, since the Koksma–Hlawka bound is often of little practical value. Various heuristics are available. Birge (1994) suggests that a rough bound may be obtained by tracking the maximum and minimum values over a period that shows equal numbers of increases and decreases. For instance the criterion could be to stop at the first set of two thousand observations in which the number of increases and decreases are within ten percent of each other. He suggests that the maximum and minimum realized values could be used as bounds on the true value. Fox (1986) suggests that we compare the estimate of the integral based on a sample of 2n points with the estimate based on n points and stop if the answer lies within some tolerance level. Paskov and Traub (1995) use a similar termination criterion based 19 For example, one of the integrals used by Bratley, Fox, and Niederreiter (1992) was

1 0

···

10 d 0 k=1

k cos(kxk )d x1 · · · d xd .

This integrand is highly periodic for large values of d.

212

P. Boyle, M. Broadie and P. Glasserman

on successive errors: stop when the difference between two consecutive approximations using 10 000i, i = 1, 2, . . . , 1000, sample points falls below some threshold. Owen (1995a, 1995b) proposes a hybrid of Monte Carlo and low-discrepancy methods which provides error estimates and has good convergence properties. In addition to these approaches, one can also run standard Monte Carlo at the outset and use the probabilistic error term to assess when enough low-discrepancy points have been used in the quasi-random calculation. This benchmarking with standard Monte Carlo would be useful if the same set of calculations were being carried out frequently with only slightly different input values. This situation is common in finance applications. There is often a need to perform the same set of calculations frequently; e.g., the risk analysis of a book of business at the end of each day. In these cases one can conduct experiments to see which sets of low-discrepancy sequences provide the best results. The right number of low-discrepancy points could be determined just once at the outset. Before leaving this section, we should mention some recent advances and new techniques to improve the performance of quasi-random Monte Carlo. Niederreiter and Xing (1996), Tezuka (1994), and Ninomiya and Tezuka (1996) have proposed new low-discrepancy sequences that appear to have the potential to perform substantially better than previous methods. We have noted that the efficiency of quasirandom Monte Carlo improves as the integrand becomes smoother. Moskowitz and Caflisch (1996) illustrate procedures that can be used for this purpose. It is sometimes possible to enhance the performance of quasi-random sequences by reducing the effective dimension of the problem. Moskowitz and Caflisch also indicate how this can be accomplished in the discretization of a Wiener process and in the solution of the Feynman–Kac equation. This is relevant for finance applications since the prices of derivative securities have a Feynman–Kac representation. See Acworth, Broadie, and Glasserman (1997), Berman (1996), and Caflisch, Morokoff, and Owen (1998) for recent work applying low-discrepancy sequences with alternative constructions of Wiener processes. Spanier and Maize (1994) discuss a battery of techniques that can be used to improve the performance of quasi-Monte Carlo methods for relatively small sample sizes. Next we compare the Monte Carlo method using pseudo-random numbers with the Faure, Halton, and Sobol’ low-discrepancy methods.

3.1 Numerical results For an initial comparison, we test the methods on the problem of pricing a European option on a single underlying asset with the usual Black–Scholes assumptions. In this framework, the Black–Scholes formula can be evaluated to give the true option values in order to compare alternative methods. Rather than using

6. Monte Carlo Methods for Security Pricing

213

a single option, we evaluate the methods on a random sample of 500 options. The probability distribution of the parameters is chosen to represent a reasonable range of values in practical applications.20 The error measure that we use is root-mean-squared (RMS) relative error defined by 7 8 m ˆ 81 Ci − Ci 2 9 , (12) RMS = m i=1 Ci where i is the index of the m = 500 options in the test set, Ci is the true option value, and Cˆ i is the estimated option value. The results are given in Figure 1. Figure 1 plots RMS relative error against the number of points, n. The Monte Carlo method (i.e., using pseudo-random numbers) displays the expected √ O(1/ n) convergence: e.g., increasing n by a factor of 100 decreases the RMS error by a factor of 10. The low-discrepancy method using Faure sequences dominates the Monte Carlo method. Indeed, 129 Faure points gives an error lower than 1000 Monte Carlo points. The Sobol’ method is the best of the three methods tested. Using 192 Sobol’ points gives an error lower than 10 000 Monte Carlo points. A major consideration in the comparison of methods is the overall computation time, not just the number of points. The Sobol’ sequence numbers can be generated significantly faster than Faure numbers (see, e.g., Bratley and Fox 1988) and as fast as most pseudo-random number methods. Hence, in the important RMS error versus computation time comparison, the relative advantage of the Sobol’ method increases. A low-discrepancy sequence will often have additional uniformity properties at certain points in the sequence (see, e.g., Fox 1986 and Bratley and Fox 1988). For example, in the Sobol’ sequence the running average returns to 0.5 at the points n = 2k − 1 for k = 1, 2, . . .. One might expect that choosing n to be one of these “favorable” points would lead to better option price estimates. For large values of n, the advantage of using favorable points becomes negligible, but for small n the effect can be quite significant. Indeed, in the experiment above, using the Sobol’ points 1 through 254 gives an RMS error of 10%, while using the points 1 through 255 gives an RMS error of 4%.21 Better results are often obtained by ignoring an initial portion of a low-discrepancy sequence. For example, using the Sobol’ points 1 through 63 gives an RMS error of 13%, while using the Sobol’ points 64 through 127 gives an RMS error of 2%. In the results in Figure 1, the Sobol’ sequence was always started at point 64, so the label 192 in Figure 1 corresponds to the 192 Sobol’ points from 64 to 255. Similarly, the Faure sequence was always started at 20 The details of the distribution are given in Broadie and Detemple (1996). 21 We take the first point of the Sobol’ sequence to be 0.5, not 0.0.

214

P. Boyle, M. Broadie and P. Glasserman 10 0

+

10 -1

Monte Carlo +

129

RMS Relative Error

x

Faure 10 -2

+

1,137

192*

x

Sobol

65,000 +

960* 9,201 x

8,128*

10 -3

61,425 x

65,472*

10 -4 10 2

10 3

10 4

10 5

n

Fig. 1. RMS relative error vs. number of points.

point 16, so the label 129 in Figure 1 corresponds to the 129 Sobol’ points from 16 to 144.

3.2 One-dimensional vs. higher dimensional sequences It is sometimes asserted that low-discrepancy methods can be implemented in existing simulation programs by simply replacing the pseudo-random number generator with a low-discrepancy sequence generator. This naive approach can lead to disastrous results as the following example shows. Consider pricing a European option on the maximum of two non-dividend paying assets with the parameters: S1 = S2 = K = 100, σ 1 = σ 2 = 0.2, ρ = 0.3, r = 0.05, and T = 1. Under the usual Black–Scholes assumptions, a formula for the price of the option can be derived (see, e.g., Johnson 1987 or Stulz 1982) and gives a price of 16.442. Running one Monte Carlo simulation with 1000 points (hence 2000 random numbers) gave an estimated price of 16.279 with a standard error of 0.533. Using 2000 one-dimensional low-discrepancy values gave a price estimate of 4.320 using the Sobol’ sequence and an estimate of 1.909 using the

6. Monte Carlo Methods for Security Pricing

215

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Fig. 2. 1000 two-dimensional Faure points.

Faure sequence (starting at point 16). The cause of the problem can be seen by examining Figures 2–5. Figures 2 and 3 show 1000 two-dimensional Faure and Sobol’ points, respectively. The figures illustrate how the sequences fill the two-dimensional space in regular but different ways. By contrast, Figures 4 and 5 show 2000 onedimensional Faure and Sobol’ points, respectively, plotted in two dimensions. The plots are created by taking successive points in the one-dimensional sequence to be the (x, y) coordinates in two-dimensional space. In neither figure are the points filling the two-dimensional space (note that the axes do not extend from 0 to 1) and this explains why the price estimates do not converge to the correct values. Even in the quarter of the unit square where the points fall, the points do not uniformly fill the space. This problem is reminiscent of the well-known “collinearity” or “hyperplane” problem of some pseudo-random number generators, but is even more serious with these low-discrepancy sequences. A similar problem can occur if a high-dimensional low-discrepancy sequence is used for a problem of low dimension. Figure 6 shows the 49th and 50th dimension of 1000 50-dimensional Faure points. Using the last two dimensions of the 50dimensional sequence to price a two-dimensional option will give very poor results.

3.3 Higher dimensional test To test the effect of problem dimension, we price options in dimensions d = 10, 50, and 100. We price discretely sampled geometric average Asian options, because the problem dimension is easily varied and a closed form solution for the price

216

P. Boyle, M. Broadie and P. Glasserman 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Fig. 3. 1000 two-dimensional Sobol’ points. 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

Fig. 4. 2000 one-dimensional Faure points.

is available (see Turnbull and Wakeman 1991). The price of a geometric average Asian option is given by C = E[e−r T ( S˜ − K )+ ],

1d where S˜ = ( i=1 Si )1/d and Si is the asset price at time i T /d. We test standard Monte Carlo, Monte Carlo with antithetic variates, and the low-discrepancy sequences of Faure, Sobol’, and Halton.22 For each dimension, we select 500 option parameters at random, and compute RMS relative error (see 22 We thank Spassimir Paskov and Joseph Traub for providing their code for the Sobol’ sequences.

6. Monte Carlo Methods for Security Pricing

217

0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

Fig. 5. 2000 one-dimensional Sobol’ points. 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

Fig. 6. Coordinates 49 and 50 of 1000 50-dimensional Faure points.

equation 12) for each method.23 Results for 50 000 and 200 000 sample points are given in Figures 7 and 8, respectively. (The antithetic method uses 25 000 and 100 000 independent pairs of points, respectively.) Results for the Halton sequence were not competitive and are suppressed. RMS error for standard Monte Carlo is nearly independent of the problem dimension. The antithetic method gives minimal variance reduction. The relative advantage, in terms of RMS error, of the low-discrepancy sequences decreases with the problem dimension. For this test problem, the crossover point is beyond dimension 100. 23 The details of the distribution are given in Broadie and Detemple (1996).

218

P. Boyle, M. Broadie and P. Glasserman

1.1

RMS Relative Error (in percent)

1.0 0.9 Monte Carlo

0.8 0.7

Antithetic

0.6 0.5 Faure

0.4 0.3 0.2

Sobol’

0.1 0.0 10

20

30

40

50 60 Dimension

70

80

90

100

90

100

Fig. 7. Results with 50 000 points.

0.45 Monte Carlo

RMS Relative Error (in percent)

0.40 0.35 0.30

Antithetic

0.25 0.20 Faure

0.15 0.10

Sobol’

0.05 0.00 10

20

30

40

50 60 Dimension

70

80

Fig. 8. Results with 200 000 points.

4 Estimating price sensitivities Most of the discussion in this paper centers on the use of Monte Carlo for pricing securities. In practice, the evaluation of price sensitivities is often as important as the evaluation of the prices themselves. Indeed, whereas prices for some securities

6. Monte Carlo Methods for Security Pricing

219

can be observed in the market, their sensitivities to parameter changes typically cannot and must therefore be computed. Since price sensitivities are important measures of risk, the growing emphasis on risk management systems suggests a greater need for their efficient computation. The derivatives of a derivative security’s price with respect to various model parameters are collectively referred to as Greeks, because several of these are commonly referred to with the names of Greek letters.24 Perhaps the most important of these – and the one to which we give primary attention – is delta: the derivative of the price of a contingent claim with respect to the current price of an underlying asset. The delta of a stock option, for example, is the derivative of the option price with respect to the current stock price. An option involving multiple underlying assets has multiple deltas, one for each underlying asset. In the rest of this section, we discuss various approaches to estimating price sensitivities, especially delta. We begin by examining finite-difference approximations and show that these can be improved through the use of common random numbers. We then discuss direct methods that estimate derivatives without requiring resimulation at perturbed parameter values.

4.1 Finite-difference approximations Consider the problem of computing the delta of the Black–Scholes price of a European call; i.e., computing dC , = d S0 where C is the option price and S0 is the current stock price. There is, of course, an explicit expression for delta, so simulation is not required, but the example is useful for purposes of illustration. A crude estimate of delta is obtained by generating a terminal stock price ST = S0 e(r − 2 σ 1

2 )T +σ

√

TZ

(13)

(see (2) for notation) from the current stock price S0 and a second, independent terminal stock price 1

ST (!) = (S0 + !)e(r − 2 σ

2 )T +σ

√

T Z

(14)

from the perturbed initial price S0 + !, with Z and Z independent. For each terminal price, a discounted payoff can be computed like this: ˆ 0 ) = e−r T max{0, ST − K }, C(S

ˆ 0 + !) = e−r T max{0, ST (!) − K } C(S

24 See, e.g., Chapter 13 of Hull (2000) for background.

220

P. Boyle, M. Broadie and P. Glasserman

(see (3) for notation). A crude estimate of delta is then provided by the finitedifference approximation ˆ 0 + !) − C(S ˆ 0 )]. ˜ = ! −1 [C(S

(15)

By generating n independent replications of ST and ST (!) we can calculate the ˜ As n → ∞, this sample mean sample mean of n independent copies of . converges to the true finite-difference ratio ! −1 [C(S0 + !) − C(S0 )],

(16)

where C(·) is the option price as a function of the current stock price. This discussion suggests that to get an accurate estimate of we should make ! small. However, because we generated ST and ST (!) independently of each other, we have ˆ 0 + !) + Var[C(S0 )]) = O(! −2 ), ˜ = ! −2 (Var[C(S Var[] ˜ becomes very large if we make ! small. To get an estimator so the variance of that converges to we must let ! decrease slowly as n increases, resulting in slow overall convergence. A general result of Glynn (1989) shows that the best possible convergence rate using this approach is typically n −1/4 . Replacing the forward ˆ 0 + !) − C(S ˆ 0− difference estimator in (15) with the central difference (2!)−1 [C(S !)] typically improves the optimal convergence rate to n −1/3 . These rates should be compared with n −1/2 , the rate ordinarily expected from Monte Carlo. Better estimators can generally be improved using the method of common random numbers, which, in this context, simply uses the same Z in (13) and (14). ˆ the finite-difference approximation thus obtained. For fixed !, the Denote by ˆ also converges to (16). The variance sample mean of independent replications of parameter is given by ˆ 0 )] + Var[C(S ˆ 0 + !)] − 2 Cov[C(S ˆ 0 ), C(S ˆ 0 + !)]), ˆ = ! −2 (Var[C(S Var[] ˆ 0 + !) are no longer independent. Indeed, if they are ˆ 0 ) and C(S because C(S ˆ has smaller variance than . ˜ That they are in fact positively correlated, then positively correlated follows from the monotonicity of the function mapping Z to Cˆ by the argument used in our discussion of antithetics in Section 3. Thus, the use of common random numbers reduces the variance of the estimate of delta. The impact of this variance reduction is most dramatic when ! is small. A simple calculation shows that, using common random numbers, ˆ 0 )| ≤ |ST (!) − ST | ˆ 0 + !) − C(S |C(S 1

≤ !e(r − 2 σ

2 )T +σ

√

TZ

.

6. Monte Carlo Methods for Security Pricing

221

Because this upper bound has finite second moment, we may conclude that ˆ 0 )|2 ] = O(! 2 ), ˆ 0 + !) − C(S E[|C(S

(17)

and therefore that ˆ 0 + !) − C(S ˆ 0 )}] = O(1); Var[! −1 {C(S ˆ remains bounded as ! → 0, whereas we saw previously i.e., the variance of ˜ increases at rate ! −2 . Thus, the more precisely we try that the variance of to estimate (by making ! small) the greater the benefit of common random numbers. Moreover, this indicates that to get an estimator that converges to ˜ resulting we may let ! decrease faster as n increases than was possible with , in faster overall convergence. An application of Proposition 2 of L’Ecuyer and Perron (1994) shows that a convergence rate of n −1/2 can be achieved in this case, and that is the best that can ordinarily be expected from Monte Carlo. For more on convergence rates using common random numbers see Glasserman and Yao (1992), Glynn (1989), and L’Ecuyer and Perron (1994). The dramatic success of common random numbers in this example relies on the ˆ 0 + !) to C(S ˆ 0 ) evidenced by (17). fast rate of mean-square convergence of C(S This rate does not apply in all cases. It fails to hold, for example, in the case of a digital option25 paying a fixed amount B if ST > K and 0 otherwise. The price of this option is C = e−r T B P(ST > K ); the obvious simulation estimator is ˆ 0 ) = 1{ST >K } e−r T B. C(S ˆ 0 + !) differ only when ST ≤ K < ST (!), we have ˆ 0 ) and C(S Because C(S ˆ 0 + !) − C(S ˆ 0 )|2 ] = B 2 e−2r T P(ST ≤ K < ST (!)) E[|C(S = B 2 e−2r T P(ST ≤ K < (1 + !/S0 )ST ) = O(!), compared with O(! 2 ) for a standard call. As a result, delta estimation is more difficult for the digital option, and a similar argument applies to barrier options generally. Even in these cases, the use of common random numbers can result in substantial improvement compared with differences based on independent runs. Table 6 compares the performance of four types of delta estimates: forward and central finite-differences with and without common random numbers. The methods are compared at four values of the perturbation parameter !, and applied to the two options discussed above. The values in the table are estimated root mean square errors. The numerical results substantiate the analysis above. Much lower errors are obtained for the standard call than for the digital option, allowing for smaller !; central differences beat forward differences; common random numbers helps, but 25 Also called a “binary” or “cash-or-nothing” option; see Hull (2000, p. 464).

222

P. Boyle, M. Broadie and P. Glasserman

Table 6. RMS errors for various delta estimation methods. !

Independent Forward Central

Common Forward Central

Standard Call Option

10 1 0.1 0.01

0.10 0.18 1.78 7.47

0.01 0.09 0.87 8.98

0.100 0.012 0.006 0.006

0.009 0.006 0.006 0.006

Digital Option

20 10 5 1

0.51 0.22 0.16 0.67

0.37 0.11 0.07 0.34

0.51 0.21 0.11 0.14

0.37 0.10 0.05 0.10

Root mean square error of delta estimates for two options using four methods with various values of !. Both options have S0 = 100, K = 100, σ = 0.40, r = 0.10, and T = 0.2. The digital option has B = 100. Each entry is computed from 1000 delta estimates, each estimate based on 10 000 replications. The value of delta is 0.580 for the first option and 2.185 for the second.

it helps the standard call more than the digital option. In several cases, the minimal error is obtained using a fairly large !. This reflects the fact that the bias resulting from a large ! is sometimes overwhelmed by the large variance resulting from a small !. Although we have discussed common random numbers in only a limited context, it can easily be applied to a wide range of problems. If all stochastic inputs to a simulation are samples from the normal distribution, then common random numbers can be implemented by using the same samples at two different parameter settings. More generally, if the stochastic inputs are all drawn from a sequence of uniform random variates, then common random numbers can be implemented by using these variates at two different parameter settings.

4.2 Direct estimates Even with the improvements in performance obtained from common random numbers, derivative estimates based on finite differences still suffer from two shortcomings. They are biased (since they compute difference ratios rather than derivatives) and they require multiple resimulations: estimating sensitivities to d parameter changes requires repeatedly running one simulation with all parameters at their base values and d additional simulations with each of the parameters perturbed.

6. Monte Carlo Methods for Security Pricing

223

The computation of 10–50 Greeks26 for a single security is not unheard of, and this represents a significant computational burden when multiple resimulations are required. Over the last decade, a variety of direct methods have been developed for estimating derivatives by simulation. Direct methods compute a derivative estimate from a single simulation, and thus do not require resimulation at a perturbed parameter value. Under appropriate conditions, they result in unbiased estimates of the derivatives themselves, rather than of a finite-difference ratio. Our discussion focuses on the use of pathwise derivatives as direct estimates, based on a technique generally called infinitesimal perturbation analysis (see, e.g., Glasserman 1991). The pathwise estimate of the true delta dC/d S0 is the derivative of the sample price Cˆ with respect to S0 . More precisely, it is d Cˆ ˆ 0 + !) − C(S ˆ 0 )], = lim ! −1 [C(S d S0 !→0 ˆ 0 + !) are computed ˆ 0 ) and C(S provided the limit exists with probability 1. If C(S from the same Z , then provided ST = K , we have d Cˆ d Cˆ d ST = d S0 d ST d S0 =e

−r T

(18)

ST 1{ST >K } . S0

We have used (13) to get √ ST d ST 1 2 = e(r − 2 σ )T +σ T Z = , d S0 S0

and

−r T d Cˆ e , −r T d =e max{0, ST − K } = 0, d ST d ST

ST > K ; ST < K .

At ST = K , C fails to be differentiable; however, since this occurs with probability ˆ S0 is almost surely well defined. zero, the random variable d C/d ˆ S0 can be thought of as a limiting case of the The pathwise derivative d C/d common random numbers finite-difference estimator in which we evaluate the limit analytically rather than numerically. It is a direct estimator of the option delta because it can be computed directly from a simulation starting at S0 without the need for a separate simulation at a perturbed value S0 . This is evident from the expression in (18). The question remains whether this estimator is unbiased; that 26 Sensitivities to various changes in the yield curve often account for several of these.

224

P. Boyle, M. Broadie and P. Glasserman

is, whether

d Cˆ E d S0

=

dC d ˆ ≡ E[C]. d S0 d S0

The unbiasedness of the pathwise estimate thus reduces to the interchangeability of derivative and expectation. The interchange is easily justified in this case; see Broadie and Glasserman (1996) for this example and conditions for more general cases. Applying the same reasoning used above, we obtain the following pathwise estimators of three other Greeks for the Black–Scholes price: Rho (dC/dr ): Vega (dC/dσ ): Theta (−dC/dT ):

K T e−r T 1{ST ≥K } ST e−r T 1{ST ≥K } ln(ST /S0 ) − (r − 12 σ 2 )T σ ST ln(ST /S0 ) re−r T max(ST − K , 0) − 1{ST ≥K } e−r T 2T +(r − 12 σ 2 )T .

Each of these estimators is unbiased. Of course, Monte Carlo estimators are not required for these derivatives because closed-form expressions are available for each. The Black–Scholes setting is useful for illustration, but the utility of the technique rests on its applicability to more general models. In Broadie and Glasserman (1996), pathwise estimates are derived and studied (both theoretically and numerically) for Asian options and a model with stochastic volatility. For example, the Asian-option delta estimate is simply e−r T

S¯ 1¯ , S0 { S>K }

where S¯ is the average asset price used to determine the option payoff. Evaluating this expression takes negligible time compared with resimulating to estimate the option price from a perturbed initial stock price. The pathwise estimate is thus both more accurate and faster to compute than the finite-difference approximation. These advantages extend to a wide class of problems. As already noted, the unbiasedness of pathwise derivative estimates depends on an interchange of derivative and expectation. In practice, this generally means that the security payoff should be a pathwise continuous function of the parameter in question. The standard call option payoff e−r T max{0, ST − K } is continuous in each of its parameters. An example where continuity fails is a digital option with payoff e−r T 1{ST >K } B, with B the amount received if the stock finishes in the

6. Monte Carlo Methods for Security Pricing

225

money.27 Because of the discontinuity at ST = K , the pathwise method (in its simplest form) cannot be applied to this type of option. The problem of discontinuities often arises in the estimation of gamma, the second derivative of an option price with respect to the current price of an underlying asset. Consider, again, the standard European call option. We have an expression ˆ S0 is ˆ S0 in (18) involving the indicator 1{ST >K } . This shows that d C/d for d C/d discontinuous in ST , preventing us from differentiating pathwise a second time to get a direct estimator of gamma. To address the problem of discontinuities, Broadie and Glasserman (1996) construct smoothed estimators. These estimators are unbiased, but not as simple to derive and implement as ordinary pathwise estimators. Broadie and Glasserman also investigate another technique for direct derivative estimation called the likelihood ratio method. This method differentiates the probability density of an asset price, rather than the outcome of the asset price itself.28 The domains of this method and the pathwise method overlap, but neither contains the other. When both apply, the pathwise method generally has lower variance. Overviews of these methods can be found in Glasserman (1991), Glynn (1987), and Rubinstein and Shapiro (1993). For discussions specific to financial applications see Broadie and Glasserman (1996) and Fu and Hu (1995).

5 Pricing American options by simulation European contingent claims have cash flows that cannot be influenced by decisions of the owner. Examples include European options, barrier options, and many types of swaps. By contrast, the cash flows of American contingent claims depend both on the price path of the underlying asset or assets and the decisions of the owner. Many types of American contingent claims trade on exchanges and in the overthe-counter market. Examples include American options, American swaptions, shout options, and American Asian options. They also arise in other contexts, for example as “real options” in the theory of economic investment described in Dixit and Pindyck (1994). To be concrete, suppose that we wish to estimate the quantity maxτ E[e−r τ h(Sτ )], where r is the constant riskless interest rate, h(Sτ ) is the payoff at time τ in state Sτ , and the max is taken over all stopping times τ ≤ T . This formulation of the American pricing problem will suffice to illustrate the major points. First, note that the state can be vector-valued and hence 27 We used this example at the end of Section 3. The settings are related: problems for which common random

numbers is particularly effective are generally problems to which the pathwise method can be applied even more effectively. 28 Though not presented in a Monte Carlo context, the expressions in Carr (1993) are potentially relevant to this approach.

226

P. Boyle, M. Broadie and P. Glasserman

applies to pricing American options on multiple assets. Second, since simulation algorithms are discrete in nature, the continuous-time exercise decision must be approximated by restricting the exercise opportunities to lie in a finite set of times 0 = t0 < t1 < · · · < td = T . This is not always a serious restriction. For example, for a call option on a stock which pays dividends at discrete points in time, it can be shown that early exercise is only optimal just prior to the ex-dividend dates. In other cases, Richardson or other extrapolation techniques can be used to better approximate the price with exercise in continuous time from a finite set of exercise opportunities.29 However, we now restrict attention to estimating the quantity P ≡ max E[e−r τ h(Sτ )], τ

(19)

where the max is taken over all stopping times τ in the set ti , for i = 0, . . . , d. The need to estimate an optimal stopping time is the crucial distinction between American and European pricing problems. If the state space is of low dimension, say three or less, a discretization scheme together with a dynamic programming algorithm can often be used to numerically approximate the value in (19). Even in these cases, simulation can be used to estimate the expectation in the recursive step. Simulation-based methods become essential when the dimension of the state space is large. An obvious simulation-based algorithm for estimating the quantity P in equation (19) is to generate a random path of states Sti , for i = 1, . . . , d, and form the path estimate Pˆ = max e−r ti h(Sti ). i=0,...,d

However, this estimator corresponds to using perfect foresight, and so it is biˆ ≥ P, which follows immediately from the inequality ased high. That is, E[ P] −r ti maxi=0,...,d e h(Sti ) ≥ e−r τ h(Sτ ). A natural goal would be to develop an alternative unbiased estimator. A negative result in this regard is provided in Broadie and Glasserman (1997): among a large class of estimators, there is no unbiased estimator of P. In particular, the estimators proposed in Tilley (1993), Grant, Vora, and Weeks (1997), and Barraquand and Martineau (1995) are all biased. Unfortunately, they provide no way to estimate the extent of the bias or to correct for the bias in a general setting. Broadie and Glasserman (1997) circumvent this problem by developing two estimators, one biased high and one biased low (but both asymptotically unbiased), which can be used together to form a valid confidence interval for the quantity P. In the remainder of this section, we give brief descriptions of the four methods mentioned and describe some strengths and weaknesses of each. 29 Geske and Johnson (1984) gave the first financial application of Richardson extrapolation. An extensive

treatment of extrapolation techniques is given in Marchuk and Shaidurov (1983).

6. Monte Carlo Methods for Security Pricing

227

5.1 Tilley’s bundling algorithm Tilley (1993) sparked considerable interest by demonstrating the potential practicality of applying simulation to pricing American contingent claims. Tilley describes a “bundling procedure” for pricing an American option on a single underlying asset. To estimate P he suggests simulating n paths of asset prices denoted Sti ( j) for i = 1, . . . , d and j = 1, . . . , n in the usual way. Next, partition the asset price space and call the paths which fall into a given partition at a fixed time a “bundle.” A dynamic programming algorithm is applied to bundles to estimate C. In particular, the estimated option price Pti ( j) at time ti for path j is the maximum of the immediate exercise value, h(Sti ( j)), and the present value of continuing. The latter value is defined to be the average of e−r (ti+1 −ti ) Pti+1 (k) over all paths k which fall in the bundle containing path j at time ti . Details of the partitioning are given in Tilley (1993). In order to implement the algorithm, all paths must be stored so they can be sorted into bundles at each time step. Since simulation typically requires a large number of paths for good estimates, the storage and sorting requirements can be significant. More importantly, the algorithm does not easily generalize to multiple state variables. In higher dimensions, it is not clear how to define the bundles. Even then it is likely that most partitions will contain very few paths and lead to a large bias, or the partitions will be so large that the continuation values are poorly estimated. Because Tilley’s algorithm uses the same paths to estimate the optimal decisions and the value, the estimator tends to be biased high (although the bundling induces an approximation which is difficult to analyze). Tilley introduces a “sharp boundary” variant which reduces the bias, but this variant does not easily generalize to higher dimensions. Carriere (1996) contains further analysis of Tilley’s algorithm and suggests a procedure based on spline functions to reduce the bias. It remains to be seen whether the spline procedure is practical for higher dimensional problems. Nevertheless, for single state variable problems, Tilley demonstrated the potential practicality of applying simulation to American-style pricing problems. 5.2 Barraquand and Martineau’s stratified state aggregation (SSA) algorithm Barraquand and Martineau (1995) propose a partitioning algorithm, but unlike Tilley’s bundling algorithm, they partition the payoff space instead of the state space. Hence, only a one dimensional space is partitioned at each time step, independent of the number of state variables.30 Their algorithm works as follows. 30 In fact, they distinguish between partitioning the state space, which they term “stratified state aggregation,”

and partitioning the payoff space, which they term “stratified state aggregation along the payoff.” The latter method is the only one that they test or specify in detail. Hence we focus our discussion on this variant of their method.

228

P. Boyle, M. Broadie and P. Glasserman (14, 2)

( S1 , S2 )

1/2

(8, 8) 1/2

1/2

(8, 6)

(2, 14) 1/2

t0

(8, 4)

(4, 2)

t1

t2

t

Fig. 9. State evolution.

First, partition the payoff space into K disjoint cells. Then simulate n paths of asset prices denoted Sti ( j) for i = 1, . . . , d and j = 1, . . . , n in the usual way. For each payoff cell k at time ti , record the number of paths, ati (k), which fall into the cell. For each pair of cells k and l at consecutive times ti and ti+1 , record the number of paths, bti (k, l), which fall into both cells. Also, for each cell k at time h(Sti ( j)), where the sum is ti , record the sum of the payoff values, cti (k) = over all paths j which fall into cell k at time ti . The transition probability from (ti , k) to (ti+1 , l) is approximated by pti (k, l) = bti (k, l)/ati (k). The estimated option price Pti (k) at time ti in cell k is the maximum of the immediate exercise value and the present value of continuing. The immediate exercise value is approximated by cti (k)/ati (k). The present value of continuing is approximated by K pti (k, l)Pti+1 (l). This procedure can be applied backwards in time e−r (ti+1 −ti ) l=1 to determine the simulation estimate of the price P. Details of a payoff space partitioning scheme are given in Barraquand and Martineau (1995). Once a single path is generated and the summary information a, b, and c is recorded, the path can be discarded. Hence the storage requirements with this method are modest: on the order of K 2 d. One drawback of this method is a possible lack of convergence, as the following example illustrates. Figure 9 shows the evolution of two asset prices (S1 , S2 ). The option payoff is h(S1 , S2 ) = max(S1 , S2 ) and for convenience the riskless rate is taken to be zero. Using the risk-neutral probabilities in Figure 9, the true value of the option at time t0 is 11, which at time t1 involves exercise in state (8, 4) but continuing in state (8, 8). When the states are partitioned by their payoffs, these two states are indistinguishable. As seen in the payoff evolution in Figure 10, the best strategy at time t1 in payoff state 8 is to continue. The apparent value of the option in Figure 10 is 9 (= (1/2)14 + (1/2)4). In this example, partitioning the payoff

6. Monte Carlo Methods for Security Pricing

229

h ( S1 , S2 ) 14 1/2

8

8 1/2

4 t0

t1

t2

t

Fig. 10. Payoff evolution.

space leads to a significant underestimate of the option value. Hence, a simulation algorithm based on partitioning the payoff space cannot converge to the correct value. Although this example may seem contrived, Broadie and Detemple (1997) show that the payoff value is not a sufficient statistic for determining the optimal exercise decision for options on the maximum of several assets. Indeed, the payoff process h(St ) is hardly ever Markovian. There is currently no way to bound the error in the Barraquand and Martineau method. Without an error estimate, it is difficult to determine the appropriate number of paths to simulate or the appropriate number of partitions to use. Their method can be slightly modified to generate an option price estimate which is biased low as follows. Their procedure gives an exercise strategy based on the immediate exercise payoff. Using this strategy, a new (independent) set of paths can be simulated, and an option value can be estimated under the exercise strategy previously estimated. The resulting option price estimate will be biased low because the exercise policy is not, in general, the optimal policy. With this modification, the average direction of the error is known. Raymar and Zwecher (1997) extend the Barraquand and Martineau approach by basing the exercise decision on a partition of two state-variables, rather than one. 5.3 Broadie and Glasserman’s random tree algorithm Broadie and Glasserman (1997) propose an algorithm based on simulated trees. In order to handle the bias problem, they develop two estimators, one biased high and one biased low, but both convergent and asymptotically unbiased as the computational effort increases. A valid confidence interval for the true value P is obtained by taking the upper confidence limit from the “high” estimator and the lower confidence limit from the “low” estimator. Briefly, their algorithm works as follows.

230

P. Boyle, M. Broadie and P. Glasserman

First, simulate a tree of asset prices (or, more generally, state variables) using b branches at each node. Two paths emanating from a node evolve as independent copies of the state process. The high estimator, &, is defined to be the value obtained by the usual dynamic programming algorithm applied to the simulated tree. Then repeat the process for n trees, and compute a point estimate and confidence interval for E[&]. A low estimator is obtained by modifying the dynamic programming algorithm at each node. Instead of using all b branches to determine the decision and value, b1 branches are used to determine the exercise decision, and the remaining b2 = b − b1 branches are used to determine the continuation value. Their actual low estimator, θ, includes another modification of this procedure which reduces the variance of the estimate. As before, estimates from n trees are combined to give a point estimate and confidence interval for E[θ]. Details of the procedure can be found in Broadie and Glasserman (1997). For the & estimator, all of the branches at a given node are used to determine the optimal decision and the corresponding node value, and this leads to an upward bias, i.e., E[&] ≥ P. For the θ estimator, the decision and the continuation value are determined from independent information sets. This eliminates the upward bias, but a downward bias occurs, i.e., E[θ ] ≤ P. The intuition for this result follows. If the correct decision is inferred at a node, the node value estimate would be unbiased. If the incorrect decision is inferred at a node, the node value estimate would be biased low because of the suboptimality of the decision. The expected node value is a weighted average of an unbiased estimate (based on the correct decision) and an estimate which is biased low (based on the incorrect decision). The net effect is an estimate which is biased low. Both estimators are consistent and asymptotically unbiased as b increases. The computational effort with this algorithm is order nbd and its main drawback is that d cannot be too large for practical computations. Broadie and Glasserman (1997) give numerical results for options with d = 4. As mentioned earlier, to approximate option values with continuous exercise opportunities, some type of extrapolation procedure is required. Special care is necessary to implement extrapolation procedures within a simulation context because of the randomness in the estimates. 5.4 Other developments31 Grant, Vora, and Weeks (1997) describe a method specially designed to price American arithmetic Asian options on a single underlying asset. In this application the optimal exercise decision depends on the current asset price and the current 31 More recent developments in pricing American options by simulation include Broadie and Glasserman (1997),

Broadie, Glasserman and Ha (2000) and Longstaff and Schwartz (2001).

6. Monte Carlo Methods for Security Pricing

231

value of the average. Using repeated simulation runs, they attempt to identify the form of an optimal exercise policy based on these two pieces of information. Once an exercise policy is specified, simulation is used to estimate the option value under this fixed policy. Since the fixed policy is a suboptimal approximation to the optimal stopping rule, their procedure leads to a simulation estimator which is biased low. GVW perform extensive sensitivity analysis which indicates that their option value estimate is relatively insensitive to deviations in the chosen exercise policy. So it may be that their method gives good option price estimates relative to some accuracy level, but it is not clear how to quantify their error. It is not clear how to improve their estimates to an arbitrary accuracy level as the simulation effort increases. Their procedure is specific to the case of American Asian options and does not at this point constitute a general approach to pricing American contingent claims. Bossaerts (1989) proposes two estimators of optimal early exercise, a moment estimator and a smooth optimization estimator, and studies their convergence properties. His method appears to require a parametric representation of the exercise boundary and may therefore face difficulties in higher dimension. The optimization approach described in Fu and Hu (1995) also requires a parametric representation. Rust (1997)32 studies the general problem of solving discrete decision problems, which include optimal stopping problems as a special case. He develops a Monte Carlo method and shows that it succeeds in breaking the “curse of dimensionality” in these problem. Rust’s focus is on computational complexity, but his approach appears to provide a promising direction for finance applications. 5.5 Summary The valuation of securities with American-type features requires the determination of optimal decisions. High dimension versions of these problems arise from multiple state variables and/or path dependencies. Although simulation is a powerful tool for solving some higher dimensional problems, conventional wisdom was that simulation could not be applied to American-style pricing problems. The algorithms described here represent the first attempts to solve these problems that were long thought to be computationally intractable. 6 Further topics We conclude this paper with a brief mention of two important areas of current work in the application of Monte Carlo methods to finance, not discussed in this article. 32 We thank A. Dixit for pointing us to this reference.

232

P. Boyle, M. Broadie and P. Glasserman

A central numerical issue in simulating interest rates, asset prices with stochastic volatilities, and other complex diffusions is the accurate approximation of stochastic differential equations by discrete-time processes. Kloeden and Platen (1992) discuss a variety of methods for constructing discrete-time approximations with different orders of convergence. Andersen (1995) applies some of these to interest-rate models. In general, decreasing the time increment in a discrete approximation can be expected to give more accurate results, but at the expense of greater computational effort. Duffie and Glynn (1995) analyze this trade-off and characterize asymptotically optimal time steps as the overall computational effort grows. In this article we have focused almost exclusively on the use of Monte Carlo for pricing. A related, growing area of application is risk management – in particular, the use of Monte Carlo to assess value at risk, credit risk, and related measures. For some examples of recent applications in these areas see Iben and Brotherton-Ratcliffe (1994), Lawrence (1994), Beckstr¨om and Campbell (1995) and Glasserman, Heidelberger and Shahabuddin (2000).

Appendix: Moment controls beat moment matching asymptotically As mentioned in Section 2.4, any time a moment is available for use with moment matching, it can alternatively be used as a control variate. In this appendix, we argue that moment matching is asymptotically equivalent to a control variate technique with suboptimal coefficients, and is therefore dominated by the optimal use of moments as controls. This asymptotic link applies in large samples. A related link between linear and nonlinear control variates is made in Glynn and Whitt (1989), but the current setting does not fit their framework. Let Z 1 , Z 2 , . . . be i.i.d. (not necessarily normal) with mean µ and variance σ 2 . Let s denote the sample standard deviation of Z 1 , . . . , Z n and Z¯ their sample mean. Suppose we want to estimate E[ f (Z )] for some function f . The standard estimator n n is n −1 i=1 f (Z i ) and the moment matching estimator is n −1 i=1 f ( Z˜ i ) with Z˜ i defined in (9). For each i, the scaled difference √ σ −s √ √ n( Z˜ i − Z i ) = n Z i − n[(σ Z¯ /s) − µ] s converges in distribution, by the central limit theorem for Z¯ and s. Thus, ( Z˜ i − Z i ) = O p (n −1/2 ) (see, e.g., Appendix A of Pollard 1984 for O p , o p notation). Suppose now that, with probability one, f is differentiable at Z i . Then f ( Z˜ i ) = f (Z i ) + f (Z i )[ Z˜ i − Z i ] + o p (n −1/2 ), suggesting that up to terms o p (n −1/2 ) the moment matching estimator and standard

6. Monte Carlo Methods for Security Pricing

233

estimator are related via n n n 1 1 1 f ( Z˜ i ) ≈ f (Z i ) + f (Z i )[ Z˜ i − Z i ] n i=1 n i=1 n i=1 ! n n 1 1 σ σ = − 1 Z i − Z¯ + µ f (Z i ) + f (Z i ) n i=1 n i=1 s s n n 1 1 σ f (Z i ) + f (Z i )Z i = −1 n i=1 n i=1 s σ ¯ 1 n f (Z i ) µ − Z + n i=1 s n 1 σ σ f (Z i ) + βˆ 1 ≡ − 1 + βˆ 2 µ − Z¯ n i=1 s s where βˆ i → β i , i = 1, 2, as n → ∞, with β 1 = E[ f (Z )Z ],

and β 2 = E[ f (Z )].

Thus, moment matching is asymptotically equivalent to using σ σ ¯ −1 and µ− Z s s

(20)

as controls (both quantities converge to zero almost surely) with estimates of coefficients β 1 , β 2 . In general, these do not coincide with the optimal coefficients β ∗1 , β ∗2 , so moment matching is asymptotically dominated by the control variate method. In addition, the controls in (20) introduce some bias (as does moment matching itself) because though they converge to zero they do not have mean zero for finite n. In contrast, the more natural moment control variates (s 2 − σ 2 ) and ( Z¯ − µ) have mean zero for all n and thus introduce no bias. References Acworth, P., M. Broadie, and P. Glasserman, 1997, A Comparison of Some Monte Carlo and Quasi Monte Carlo Methods for Option Pricing, in Monte Carlo and Quasi Monte Methods for Scientific Computing, G. Larcher, P. Hellekalek, H. Niederreiter, and P. Zinterhof (eds.), Springer-Verlag, Berlin. Andersen, L., 1995, Efficient Techniques for Simulation of Interest Rate Models Involving Non-Linear Stochastic Differential Equations, Working paper (General Re Financial Products, New York, NY). Andersen, L., and R. Brotherton-Ratcliffe, 1996, Exact Exotics, Risk 9, October, 85–89. Barlow, R.E. and F. Proschan, 1975, Statistical Theory of Reliability and Life Testing (Holt, Reinhart and Winston, New York). Barraquand, J., 1995, Numerical Valuation of High Dimensional Multivariate European Securities, Management Science 41, 1882–1891.

234

P. Boyle, M. Broadie and P. Glasserman

Barraquand, J. and D. Martineau, 1995, Numerical Valuation of High Dimensional Multivariate American Securities, Journal of Financial and Quantitative Analysis 30, 383–405. Beaglehole, D., P. Dybvig, and G. Zhou, 1997, Going to Extremes: Correcting Simulation Bias in Exotic Option Valuation, Financial Analysts Journal (Jan/Feb) 62–68. Beckstr¨om, R. and A. Campbell, 1995, An Introduction to VAR (CATS Software, Palo Alto, California). Berman, L., 1996, Comparison of Path Generation Methods for Monte Carlo Valuation of Single Underlying Derivative Securities, Research Report RC-20570, IBM Research, Yorktown Heights, New York. Birge, J.R., 1994, Quasi-Monte Carlo Approaches to Option Pricing, Technical Report 94–119 (Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI 48109). Bossaerts, P., 1989, Simulation Estimators of Optimal Early Exercise, Working paper (Carnegie-Mellon University, Pittsburgh, PA, 15213). Boyle, P., 1977, Options: A Monte Carlo Approach, Journal of Financial Economics 4, 323–338. Boyle, P. and D. Emanuel, 1985, The Pricing of Options on the Generalized Mean, Working paper (University of Waterloo). Bratley, P. and B. Fox, 1988, ALGORITHM 659: Implementing Sobol’s Quasirandom Sequence Generator, ACM Transactions on Mathematical Software 14, 88–100. Bratley, P., B.L. Fox, and H. Niederreiter, 1992, Implementation and Tests of Low-Discrepancy Sequences, ACM Transactions on Modelling and Computer Simulation 2, 195–213. Bratley, P., B.L. Fox, and L. Schrage, 1987, A Guide to Simulation, 2nd Ed. (Springer-Verlag, New York). Broadie, M. and J. Detemple, 1997, The Valuation of American Options on Multiple Assets, Mathematical Finance 7, 241–286. Broadie, M. and J. Detemple, 1996, American Option Valuation: New Bounds, Approximations, and a Comparison of Existing Methods, Review of Financial Studies 9, 1211–1250. Broadie, M. and P. Glasserman, 1996, Estimating Security Price Derivatives by Simulation, Management Science 42, 269–285. Broadie, M. and P. Glasserman, 1997, Pricing American-Style Securities Using Simulation, Journal of Economic Dynamics and Control 21, 1323–1352. Broadie, M. and P. Glasserman, 1997, A Stochastic Mesh Method for Pricing High-Dimensional American Options, Working paper, Columbia Business School, New York. Broadie, M., P. Glasserman, and Z. Ha, 2000, Pricing American Options by Simulation Using a Stochastic Mesh with Optimized Weights, in Probabilistic Constrained Optimization, S. Uryasev, ed., 26–44 (Kluwer, Norwell, Mass.) Caflisch, R.E., W., Morokoff, and A. Owen, 1998, Valuation of Mortgage Backed Securities Using Brownian Bridges to Reduce Effective Dimension, in Monte Carlo: Methodologies and Applications for Pricing and Risk Management, 301–314 (Risk Publications, London). Carr, P., 1993, Deriving Derivatives of Derivative Securities, Working paper (Johnson Graduate School of Business, Cornell University). Carriere, J.F., 1996, Valuation of the Early-Exercise Price for Derivative Securities using Simulations and Splines, Insurance: Mathematics and Economics 19, 19–30. Carverhill, A. and K. Pang, 1995, Efficient and Flexible Bond Option Valuation in the

6. Monte Carlo Methods for Security Pricing

235

Heath, Jarrow and Morton Framework, Journal of Fixed Income 5, September, 70–77. Cheyette, O., 1992, Term Structure Dynamics and Mortgage Valuation, Journal of Fixed Income 2, March, 28–41. Clewlow, L. and A. Carverhill, 1994, On the Simulation of Contingent Claims, Journal of Derivatives 2, Winter, 66–74. Devroye, L., 1986, Non-Uniform Random Variate Generation (Springer-Verlag, New York). Dixit, A. and R. Pindyck, 1994, Investment Under Uncertainty (Princeton University Press). Duan, J.-C., 1995, The GARCH Option Pricing Model, Mathematical Finance 5, 13–32. Duan, J.-C. and J.-G. Simonato, 1998, Empirical Martingale Simulation for Asset Prices, Management Science 44, 1218–1233. Duffie, D., 1996, Dynamic Asset Pricing Theory, 2nd ed. (Princeton University Press, Princeton, New Jersey). Duffie, D. and P. Glynn, 1995, Efficient Monte Carlo Simulation of Security Prices, Annals of Applied Probability 5, 897–905. Faure H., 1982, Discr´epance de Suites Associ´ees a` un Syst`eme de Num´eration (en Dimension s), Acta Arithmetica 41, 337–351. Fox, B.L., 1986, ALGORITHM 647: Implementation and Relative Efficiency of Quasi-Random Sequence Generators, ACM Transactions on Mathematical Software 12, 362–376. Fu, M. and J.Q. Hu, 1995, Sensitivity Analysis for Monte Carlo Simulation of Option Pricing, Probability in the Engineering and Information Sciences 9, 417–446. Fu, M., D. Madan, and T. Wong, 1998, Pricing Continuous Time Asian Options: A Comparison of Analytical and Monte Carlo Methods, Journal of Computational Finance 2, 49–74. Geske, R. and H.E. Johnson, 1984, The American Put Options Valued Analytically, Journal of Finance 39, 1511–1524. Glasserman, P., 1991, Gradient Estimation via Perturbation Analysis (Kluwer Academic Publishers, Norwell, Mass). Glasserman, P., 1993, Filtered Monte Carlo, Mathematics of Operations Research 18, 610–634. Glasserman, P., P. Heideberger, and P. Shahabuddin, 2000, Variance Reduction Techniques for Estimating Value-at-Risk, Management Science 46, 1349–1365. Glasserman, P. and D.D. Yao, 1992, Some Guidelines and Guarantees for Common Random Numbers, Management Science 38, 884–908. Glynn, P.W., 1987, Likelihood Ratio Gradient Estimation: An Overview, in: Proceedings of the Winter Simulation Conference (The Society for Computer Simulation, San Diego, California) 366–374. Glynn, P.W., 1989, Optimization of Stochastic Systems via Simulation, in: Proceedings of the Winter Simulation Conference (The Society for Computer Simulation, San Diego, California) 90–105. Glynn, P.W. and D.L. Iglehart, 1988, Simulation Methods for Queues: An Overview, Queueing Systems 3, 221–255. Glynn, P.W. and W. Whitt, 1989, Indirect Estimation via L = λW , Operations Research 37, 82–103. Glynn, P.W. and W. Whitt, 1992, The Asymptotic Efficiency of Simulation Estimators, Operations Research 40, 505–520. Grant, D., G. Vora, and D. Weeks, 1997, Path-Dependent Options: Extending the Monte

236

P. Boyle, M. Broadie and P. Glasserman

Carlo Simulation Approach, Management Science 43, 1589–1602. Halton, J.H., 1960, On the Efficiency of Certain Quasi-Random Sequences of Points in Evaluating Multi-Dimensional Integrals, Numerische Mathematik 2, 84–90. Hammersley, J.M. and D.C. Handscomb, 1964, Monte Carlo Methods (Chapman and Hall, London). Haselgrove, C.B., 1961, A Method for Numerical Integration, Mathematics of Computation 15, 323–337. Hlawka, E., 1971, Discrepancy and Riemann Integration, in: L. Mirsky, ed., Studies in Pure Mathematics (Academic Press, New York). Hull, J., 2000, Options, Futures, and Other Derivative Securities, 4th ed. (Prentice-Hall, Englewood Cliffs, New Jersey). Hull, J. and A. White, 1987, The Pricing of Options on Assets with Stochastic Volatilities, Journal of Finance 42, 281–300. Iben, B. and R. Brotherton-Ratcliffe, 1994, Credit Loss Distributions and Required Capital for Derivatives Portfolios, Journal of Fixed Income 4, June, 6–14. Johnson, H., 1987, Options on the Maximum or the Minimum of Several Assets, Journal of Financial and Quantitative Analysis 22, 227–283. Johnson, H. and D. Shanno, 1987, Option Pricing When the Variance is Changing, Journal of Financial and Quantitative Analysis 22, 143–151. Joy C., P.P. Boyle, and K.S. Tan, 1996, Quasi-Monte Carlo Methods in Numerical Finance, Management Science 42, 926–938. Kemna, A.G.Z. and A.C.F. Vorst, 1990, A Pricing Method for Options Based on Average Asset Values, Journal of Banking and Finance 14, 113–129. Kloeden, P. and E. Platen, 1992, Numerical Solution of Stochastic Differential Equations (Springer-Verlag, New York). L’Ecuyer, P. and G. Perron, 1994, On the Convergence Rates of IPA and FDC Derivative Estimators, Operations Research 42, 643–656. Lavenberg, S.S. and P.D. Welch, 1981, A Perspective on the Use of Control Variables to Increase the Efficiency of Monte Carlo Simulations, Management Science 27, 322–335. Lawrence, D., 1994, Aggregating Credit Exposures: The Simulation Approach, in: Derivative Credit Risk (Risk Publications, London). Longstaff, F.A. and E.S. Schwartz, 2001, Valuing American Options by Simulation: A Simple Least Squares Approach, Review of Financial Studies 14, 113–148. Marchuk, G. and V. Shaidurov, 1983, Difference Methods and Their Extrapolations (Springer Verlag, New York). McKay, M.D., W.J. Conover, and R.J. Beckman, 1979, A Comparison of Three Methods for Selecting Input Variables in the Analysis of Output from a Computer Code, Technometrics 21, 239–245. Morokoff, W.J. and R.E. Caflisch, 1995, Quasi-Monte Carlo Integration, Journal of Computational Physics, 122, 218–230. Moskowitz B. and R.E. Caflisch, 1996, Smoothness and Dimension Reduction in Quasi-Monte Carlo Methods, Mathematical and Computer Modeling 23, 37–54. Niederreiter, H., 1988, Low Discrepancy and Low Dispersion Sequences, Journal of Number Theory 30, 51–70. Niederreiter, H., 1976, On the Distribution of Pseudo-Random Numbers Generated by the Linear Congruential Method. III, Mathematics of Computation 30, 571–597. Niederreiter, H., 1992, Random Number Generation and Quasi-Monte Carlo Methods (CBMS-NSF 63, SIAM, Philadelphia, Pa). Niederreiter, H. and C. Xing, 1996, Low-Discrepancy Sequences and Global Function

6. Monte Carlo Methods for Security Pricing

237

Fields with Many Rational Places, Finite Fields and their Applications 2, 241–273. Nielsen, S., 1994, Importance Sampling in Lattice Pricing Models, Working paper (Management Science and Information Systems, University of Texas at Austin). Ninomiya, S., and S. Tezuka, 1996, Toward Real-Time Pricing of Complex Financial Derivatives, Applied Mathematical Finance 3, 1–20. Owen, A., 1995a, Monte Carlo Variance of Scrambled Equidistribution Quadrature, in: H. Niederreiter and P.J.S. Shiue, eds., Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (Springer-Verlag, Berlin). Owen, A., 1995b, Randomly Permuted (t, m, s)-Nets and (t, s)-Sequences, in Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing, H. Niederreiter and P. Shiue (eds.), 299–317 (Springer-Verlag, New York). Paskov, S. and J. Traub, 1995, Faster Valuation of Financial Derivatives, Journal of Portfolio Management 22, Fall, 113–120. Pollard, D., 1984, Convergence of Stochastic Processes, Springer-Verlag, New York. Press, W.H., S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, 1992, Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. (Cambridge University Press). Raymar, S., and M. Zwecher, 1997, A Monte Carlo Valuation of American Call Options On the Maximum of Several Stocks, Journal of Derivatives 5 (Fall), 7–24. Reider, R., 1993, An Efficient Monte Carlo Technique for Pricing Options, Working paper (Wharton School, University of Pennsylvania). Rubinstein, R. and A. Shapiro, 1993, Discrete Event Systems (Wiley, New York). Rust, J., 1997, Using Randomization to Break the Curse of Dimensionality, Econometrica 65, 487–516. Schwartz, E.S. and W.N. Torous, 1989, Prepayment and the Valuation of Mortgage-Backed Securities, Journal of Finance 44, 375–392. Scott, L.O., 1987, Option Pricing when the Variance Changes Randomly: Theory, Estimation, and an Application, Journal of Financial and Quantitative Analysis 22, 419–438. Shaw, J., 1995, Beyond VAR and Stress Testing, in Monte Carlo: Methodologies and Applications for Pricing and Risk Management, 231–244 (Risk Publications, London). Sobol’, I.M., 1967, On the Distribution of Points in a Cube and the Approximate Evaluation of Integrals, USSR Computational Mathematics and Mathematical Physics 7, 86–112. Spanier, J. and E.H. Maize, 1994, Quasi-Random Methods for Estimating Integrals Using Relatively Small Samples, SIAM Review 36, 18–44. Stein, M., 1987, Large Sample Properties of Simulations Using Latin Hypercube Sampling, Technometrics 29, 143–151. Stulz, R.M., 1982, Options on the Minimum or the Maximum of Two Risky Assets, Journal of Financial Economics 10, 161–185. Tezuka, S., 1994, A Generalization of Faure Sequences and its Efficient Implementation, Research Report RTO105 (IBM Research, Tokyo Research Laboratory, Kanagawa, Japan). Tezuka, S., 1995, Uniform Random Numbers: Theory and Practice (Kluwer Academic Publishers, Boston). Tilley, J.A., 1993, Valuing American Options in a Path Simulation Model, Transactions of the Society of Actuaries 45, 83–104. Turnbull, S.M. and L.M. Wakeman, 1991, A Quick Algorithm for Pricing European Average Options, Journal of Financial and Quantitative Analysis 26, 377–389. Van Rensberg J. and G.M. Torrie, 1993, Estimation of Multidimensional Integrals: Is

238

P. Boyle, M. Broadie and P. Glasserman

Monte Carlo the Best Method?, Journal of Physics A: Mathematical and General 26, 943–953. Wiggins, J.B., 1987, Option Values under Stochastic Volatility: Theory and Empirical Evidence, Journal of Financial Economics 19, 351–372. Willard, G.A., 1997, Calculating Prices and Sensitivities for Path-Dependent Derivative Securities in Multifactor Models, Journal of Derivatives 5 (Fall), 45–61. Worzel, K.J., C. Vassiadou-Zeniou, and S.A. Zenios, 1994, Integrated Simulation and Optimization Models for Tracking Indices of Fixed-Income Securities, Operations Research 42, 223–233. Zaremba, S.K., 1968, The Mathematical Basis of Monte Carlo and Quasi-Monte Carlo Methods, SIAM Review 10, 310–314.

Part two Interest Rate Modeling

7 A Geometric View of Interest Rate Theory Tomas Bj¨ork

1 Introduction 1.1 Setup We consider a bond market model (see Bj¨ork (1997), Musiela and Rutkowski (1997)) living on a filtered probability space (, F, F, Q) where F = {Ft }t≥0 . The basis is assumed to carry a standard m-dimensional Wiener process W , and we also assume that the filtration F is the internal one generated by W . By p(t, x) we denote the price, at t, of a zero coupon bond maturing at t + x, and the forward rates r (t, x) are defined by r (t, x) = −

∂ log p(t, x) . ∂x

Note that we use the Musiela parameterization, where x denotes the time to maturity. The short rate R is defined as R(t) = r (t, 0), and the money account

t B is given by B(t) = exp 0 R(s)ds . The model is assumed to be free of arbitrage in the sense that the measure Q above is a martingale measure for the model. In other words, for every fixed time of maturity T ≥ 0, the process Z (t, T ) = p(t, T − t)/B(t) is a Q-martingale. Let us now consider a given forward rate model of the form " dr (t, x) = β(t, x)dt + σ (t, x)dW, (1) r (0, x) = r o (0, x), where, for each x, β and σ are given optional processes. The initial curve {r o (0, x); x ≥ 0} is taken as given. It is interpreted as the observed forward rate curve. The standard Heath–Jarrow–Morton drift condition (Heath, Jarrow and Morton (1992)) can easily be transferred to the Musiela parameterization. The result (see Brace and Musiela (1994), Musiela (1993)) is as follows. 241

242

T. Bj¨ork

Proposition 1.1 (The forward rate equation) Under the martingale measure Q the r -dynamics are given by

x ∂ σ (t, u)- du dt + σ (t, x)dW (t), (2) dr (t, x) = r (t, x) + σ (t, x) ∂x 0 o (3) r (0, x) = r (0, x). where - denotes transpose. 1.2 Main problems Suppose now that we are give a concrete model M within the above framework, i.e. suppose that we are given a concrete specification of the volatility process σ . We now formulate a couple of natural problems: 1. Take, in addition to M, also as given a parameterized family G of forward rate curves. Under which conditions is the family G consistent with the dynamics of M? Here consistency is interpreted in the sense that, given an initial forward rate curve in G, the interest rate model M will only produce forward rate curves belonging to the given family G. 2. When can the given, inherently infinite dimensional, interest rate model M be written as a finite dimensional state space model? More precisely, we seek conditions under which the forward rate process r (t, x), induced by the model M, can be realized by a system of the form d Zt

= a(Z t )dt + b(Z t )dWt ,

r (t, x) = G(Z t , x),

(4) (5)

where Z (interpreted as the state vector process) is a finite dimensional diffusion, a(z), b(z) and G(z, x) are deterministic functions and W is the same Wiener process as in in (2). As will be seen below, these two problems are intimately connected, and the main purpose of this chapter is to give an overview of some recent work in this area. The text is mainly based on Bj¨ork and Christensen (1999), Bj¨ork and Gombani (1999) and Bj¨ork and Svensson (1999), but the presentation given below is more focused on geometric intuition than the original articles, where full proofs, technical details and further results can be found. In the analysis below we use ideas from systems and control theory (see Isidori (1989)) as well as from nonlinear filtering theory (see Brockett (1981)). References to the literature will sometimes be given in the text, but will mainly be summarized in the Notes at the end of each section. The organization of the text is as follows. In Section 2 we study the existence of a finite dimensional factor realization in the comparatively simple case when

7. A Geometric View of Interest Rate Theory

243

the forward rate volatilities are deterministic. In Section 3 we study the general consistency problem, and in Section 4 we use the consistency results from Section 3 in order to give a fairly complete picture of the nonlinear realization problem.

2 Linear realization theory In the general case, the forward rate equation (2) is a highly nonlinear infinite dimensional SDE but, as can be expected, the special case of linear dynamics is much easier to handle. In this section we therefore concentrate on linear forward rate models, and look for finite dimensional linear realizations.

2.1 Deterministic forward rate volatilities For the rest of the section we only consider the case when the volatility σ (t, x) = [σ 1 (t, x), . . . , σ m (t, x)] is a deterministic time-independent function σ (x) of x only. Assumption 2.1 The volatility σ is a deterministic C ∞ -mapping σ : R+ → R m . Denoting the function x −→ r (t, x) by r (t) we have, from (2), dr (t) = {Fr (t) + D} dt + σ dW (t), r (0) = r (0). o

(6) (7)

Here the linear operator F is defined by F= whereas the function D is given by

∂ , ∂x

D(x) = σ (x)

x

(8)

σ (s)- ds.

(9)

0

The point to note here is that, because of our choice of a deterministic volatility σ (x), the forward rate equation (6) is a linear (or rather affine) SDE. Because of this linearity (albeit in infinite dimensions) we therefore expect to be able to provide an explicit solution of (6). We now recall that a scalar equation of the form dy(t) = [ay(t) + b] dt + cdW (t) has the solution

y(t) = e y(0) + at

t

e 0

a(t−s)

bds +

t

ea(t−s) cdW (s), 0

244

T. Bj¨ork

and we are led to conjecture that the solution to (6) is given by the formal expression t t r (t) = eFt r o + eF(t−s) Dds + eF(t−s) σ dW (s). 0

0

The formal exponential e Ft acts on real valued functions, and we have to figure out how it operates. From the standard series expansion of the exponential function one is led to write ∞ Ft tn n F f (x). e f (x) = (10) n! n=0 In our case F n =

∂n , ∂xn

so (assuming f to be analytic) we have ∞ tn ∂n f eFt f (x) = (x). n! ∂ x n n=0

(11)

This is, however, just series expansion of f around the point x, so for a Taylor analytic f we have eFt f (x) = f (x + t). We have in fact the following precise result (which can be proved rigorously). Proposition 2.2 The operator F is the infinitesimal generator of the semigroup of left translations, i.e. for any f ∈ C[0, ∞) we have Ft e f (x) = f (t + x). The solution of the forward rate equation (6) is given as t t eF(t−s) D(x)ds + eF(t−s) σ (x)dW (s) r (t, x) = eFt r o (0, x) + 0

(12)

0

or equivalently by

t

r (t, x) = r (0, x + t) + o

t

D(x + t − s)ds +

0

σ (x + t − s)dW (s).

(13)

0

From (12) it is clear by inspection that we may write the forward rate equation (6) as dr0 (t, x) = Fr0 (t, x)dt + σ (x)dW (t), r0 (0, x) = 0 r (t, x) = r0 (t, x) + δ(t, x), where δ is given by

δ(t, x) = r (0, x + t) + o

0

t

(14) (15)

D(x + t − s)ds.

(16)

7. A Geometric View of Interest Rate Theory

245

Since δ(t, x) is not affected by the input W , we see that the problem of finding a realization for the term structure system (6) is equivalent to that of finding a realization for (14). We are thus led to the following definition. Definition 2.3 A matrix triple [A, B, C(x)] is called an n-dimensional realization of the systems (6) and (14) if r0 has the representation d Z (t) =

AZ (t)dt + BdW (t), Z (0) = 0,

r0 (t, x) = C(x)Z (t).

(17) (18)

Our main problems are now as follows. • • • • •

Take as a priori given a volatility structure σ (x). When does there exists a finite dimensional realization? If there exists a finite dimensional realization, what is the minimal dimension? How do we construct a minimal realization from knowledge of σ ? Is there an economic interpretation of the state process Z in the realization?

2.2 Existence of finite linear realizations We will now go on to study the existence of a finite dimensional realization of the stochastic system (14), and in order to get some ideas, suppose that there actually exists a finite dimensional realization of (14) of the form (17)–(18). Solving (14), we have t t F(t−s) e σ (x)dW (s) = σ (x + t − s)dW (s), r0 (t, x) = 0

0

while, from the realization (17)–(18), we also have t r0 (t, x) = C(x)Z (t) = C(x) e A(t−s) BdW (s). 0

Thus we have, with probability one, for each x and each t, t t σ x (t − s)dW (s) = C(x)e A(t−s) BdW (s), 0

(19)

0

where we use subindex x to denote left translation, i.e. f x (t) = f (x + t). This leads us immediately to conjecture that the equation σ x (t) = C(x)e At B must hold for all x and t, and we have our first main result.

246

T. Bj¨ork

Proposition 2.4 1. The forward rate process has a finite dimensional linear realization if and only if the volatility function σ can be written in the form σ (x) = C0 e Ax B.

(20)

2. If σ has the form (20) then a concrete realization of r0 is given by d Z (t) =

AZ (t)dt + BdW (t), Z (0) = 0,

r0 (t, x) = C(x)Z (t),

(21) (22)

with A, B as in (20), and with C(x) = C0 e Ax . The forward rates r (t, x) are then given by (15)–(16). Proof It is clear from the discussion above that if there exists a finite realization, then we must have the factorization σ x (t) = C(x)e At B. Setting x = 0, and denoting C(0) by C0 , in this case gives us the relation (20). If, on the other hand, σ factors as in (20), then we simply define Z as in (21). A direct calculation as above then shows that we have r0 (t, x) = C0 e Ax z(t). Remark 2.5 Let us call a function of the form ce Ax b, where c is a row vector, A is a square matrix and b is a column vector, a quasi-exponential (or QE) function. The general form of a quasi-exponential function f is given by f (x) = eλi x + eαi x p j (x) cos(ω j x) + q j (x) sin(ω j x) , (23) i

j

where λi , α 1 , ω j are real numbers, whereas p j and q j are real polynomials. QE functions will turn up again, so we list some simple properties. Lemma 2.6 The following hold for the quasi-exponential functions: • A function is QE if and only if it is a component of the solution of a vector valued linear ODE with constant coefficients. • A function is QE if and only if it can be written as f (x) = ce Ax b. • If f is QE, then f is QE. • If f is Q E, then its primitive function is QE. • If f and g are QE, then f g is QE.

7. A Geometric View of Interest Rate Theory

247

2.3 Transfer functions Using ideas from linear systems theory, an alternative view of the realization problem is obtained by studying transfer functions, i.e. by going to the frequency domain. To get some intuition, consider again the equation dr0 (t, x) = Fr0 (t, x)dt + σ (x)dW (t), r0 (0, x) = 0.

(24)

Let us now formally “divide by dt”, which gives us dr0 dW (t, x) = Fr0 (t, x) + σ (x) (t), dt dt where the formal time derivative ddtW (t) is interpreted as white noise. We interpret this equation as an input–output system where the random input signal t −→ ddtW (t) is transformed into the infinite dimensional output signal t −→ r0 (t, ·). We thus view the equation as a version of the following controlled ODE: dr0 (t, x) = Fr0 (t, x) + σ (x)u(t), dt r0 (0) = 0,

(25)

where u is a deterministic input signal. Generally speaking, tricks like this do not work directly, since we are ignoring the difference between standard differential calculus, which is used to analyze (25), and Itˆo calculus which we use when dealing with SDEs. In this case, however, because of the linear structure, the second order Itˆo term will not come into play, so we are safe. (See the discussion in Section 3.4 around the Stratonovich integral for how to treat the nonlinear situation.) It is now natural to study the transfer function for the system (25), which relates the Laplace transform of the input signal to the Laplace transform of the output signal. Definition 2.7 The transfer function, K (s, x), for (25) is determined by the relation r˜0 (s, x) = K (s, x)u(s), ˜ where ˜ denotes the Laplace transform in the t-variable. From the uniqueness of the Laplace transform we then have the following result. Lemma 2.8 The system d Z (t) =

AZ (t)dt + BdW (t), Z (0) = 0,

r0 (t, x) = C(x)Z (t)

(26) (27)

248

T. Bj¨ork

is a realization of dr0 (t, x) = Fr0 (t, x)dt + σ (x)dW (t), r0 (0, x) = 0

(28)

if and only if the deterministic control system dr0 (t, x) = Fr0 (t, x) + σ (x)u(t) dt has the same transfer function as the system dZ (t) = AZ (t) + Bu(t), dt r0 (t, x) = C(x)Z (t).

(29)

(30) (31)

Furthermore we have Lemma 2.9 The transfer function K (s, x) of (29) is given by K (s, x) = L [σ x ] (s), where L denotes the Laplace transform, and σ x denotes left translation. Proof From (29) we have

t

r0 (t, x) =

σ (x + t − s)u(s)ds = [σ x - u] (t),

0

and thus r˜0 (s, x) = L [σ x ] (s)u(s). ˜ For concrete computation of a realization, the following result is useful. Lemma 2.10 • The transfer function of the system (30)–(31) is given by K (s, x) = C(x) [s I − A]−1 B. • The r0 system has a finite realization if and only if there exists a factorization of the form L [σ x ] (s) = C(x) [s I − A]−1 B. • Denote the transfer function of r0 by K (s, x), and assume that that there exits a finite dimensional realization. If we have found A, B and C such that K (s, 0) = C [s I − A]−1 B, then a realization of r0 is given by A, B, Ce Ax .

7. A Geometric View of Interest Rate Theory

249

Proof The first assertion is immediately obtained by taking the Laplace transform of (30)–(31). The second follows from Lemma 2.8, and the third from Proposition 2.4. If we want to find a concrete realization for a given system, we thus have two possibilities. We can either look for a factorization of the volatility function as σ (x) = Ce Ax B, or we can try to factor the transfer function as K (s, 0) = C [s I − A]−1 B. From a logical point of view the two approaches are equivalent, but from a practical point of view it is much easier to factor the transfer function than to factor the volatility. There are in fact a number of standard algorithms in the systems theoretic literature which construct a realization, given knowledge of the transfer functions. See Brockett (1970).

2.4 Minimal realizations The purpose of this section is to determine the minimal dimension of a finite dimensional realization. Definition 2.11 The dimension of a realization [A, B, C(x)] is defined as the dimension of the corresponding state space. A realization [A, B, C(x)] is said to be minimal if there is no other realization with smaller dimension. The McMillan degree, D, of the forward rate system is defined as the dimension of a minimal realization. In order to get a feeling for how to determine the McMillan degree, we note that r0 has a finite dimensional realization if and only if r0 evolves on a finite dimensional subspace in the infinite dimensional function space H. Furthermore, it seems obvious that the McMillan degree equals the dimension of this subspace. In order to determine the subspace above, let us again view the r 0 system as a special case of the following controlled equation, where we have suppressed x. dr 0 = Fr0 (t) + σ u(t), dt (32) r0 (0) = 0. The solution of this equation is given by t t ∞ (t − s)n n eF(t−s) σ u(s)ds = r0 (t) = F σ u(s)ds. n! 0 0 0 This is a linear combination of vectors of the form Fn σ i , so we see that the smallest subspace R which contains r0 (t) for all t and for all choices of the input signal u

250

T. Bj¨ork

is given by R = span σ , Fσ , F2 σ , . . . = span Fk σ i ; i = 1, . . . , m k = 0, 1, . . . . (33) We thus have the following result. Proposition 2.12 Take the volatility function σ = [σ 1 , . . . , σ m ] as given. Then the McMillan degree, D, is given by D = dim (R) ,

(34)

with R defined as in (33). The forward rate system thus admits a finite dimensional realization if and only if the space spanned by the components of σ and all their derivatives is finite dimensional.

2.5 Economic interpretation of the state space In general, the state space of the minimal realization of a given system has no concrete (e.g. physical) interpretation. In our case, however, the states of the minimal realization turn out to have a simple economic interpretation in terms of a minimal set of “benchmark” forward rates. Assume that [A, B, C] is a minimal realization, of dimension n, of the forward rates as in (21)–(22). Let us choose a set of “benchmark” maturities x1 , . . . , xn . We use the notation x¯ = (x1 , . . . , x n ). Assume furthermore that the maturity vector x¯ is chosen so that the matrix Ce Ax1 .. T (x) ¯ = . Ce Axn is invertible. It can be shown (see Bj¨ork and Gombani (1999)) that, outside a set of measure zero, this can always be done as long as the maturities are distinct. We use the notation r0 (t, x 1 ) .. ¯ = r0 (t, x) . r0 (t, xn )

and corresponding interpretations for column vectors like r (t, x), ¯ δ(t, x) ¯ etc. The following result shows how the entire term structure is determined by the benchmark forward rates.

7. A Geometric View of Interest Rate Theory

251

Proposition 2.13 Assume that (21)–(22) is a minimal realization of the forward rates, and assume furthermore that a maturity vector x¯ = (x1 , . . . , xn ) is chosen as above. Then the following hold. • With notation as above, the vector r(t, x) ¯ of benchmark forward rates has the dynamics −1 dr (t, x) ¯ = T (x)AT ¯ (x)r ¯ (t, x) ¯ + %(t, x) ¯ dt + T (x)Bd ¯ W (t), (35) ¯ r (0, x) ¯ = r - (0, x), where the deterministic function % is given by ∂r −1 (x)δ(t, ¯ x). ¯ (0, t e¯ + x) ¯ + D(t e¯ + x) ¯ − T (x)AT ¯ ∂x Here e¯ ∈ R n denotes the vector with unit components, i.e. 1 1 e¯ = . . .. 1 %(t, x) ¯ =

• The system of benchmark forward rates determine the entire forward rate process according to the formula ¯ (t, x) ¯ − Ce Ax T −1 (x)δ(t, ¯ x) ¯ + δ(t, x). r (t, x) = Ce Ax T −1 (x)r

(36)

• The correspondence between Z and r is given by r0 (t, x) ¯ = T (x)Z ¯ (t).

(37)

Proof See Bj¨ork and Gombani (1999). The conclusion is thus that the state variables of a minimal realization can be interpreted as an affine transformation of a vector of benchmark forward rates.

2.6 Examples In this section we will give some simple illustrations of the theory. Note the handling of multiple roots of the matrix A, and the fact that the input noise can have dimension smaller than the dimension of A. Example 2.14 σ (x) = σ e−ax We consider a model driven by a one-dimensional Wiener process, having the forward rate volatility structure σ (x) = σ e−ax ,

252

T. Bj¨ork

where σ in the right hand side denotes a constant. (The reader will probably recognize this example as the Hull–White model.) We start by determining the McMillan degree D, and by Proposition 2.12 we have D = dim(R), where the space R is given by R = span

! dk −ax σ e ; k ≥ 0 . dxk

It is obvious that R is one dimensional, and that it is spanned by the single function e−ax . Thus the McMillan degree is given by D = 1. We now want to apply Proposition 2.4 to find a realization, so we must factor the volatility function. In this case this is easy, since we have the trivial factorization σ (x) = 1 · e−ax · σ . In the notation of Proposition 2.4 we thus have C0 = 1, A = −a, B = σ. A realization of the forward rates is thus given by d Z (t) = −a Z (t)dt + σ dW (t), r0 (t, x) = e−ax Z (t), r (t, x) = r0 (t, x) + δ(t, x), and since the state space in this realization is of dimension one, the realization is minimal. We see that if a > 0 then the system is asymptotically stable. We now go on to the interpretation of the state space, and since D = 1 we can choose a single benchmark maturity. The canonical choice is of course x1 = 0, i.e. we choose the instantaneous short rate R(t) as the state variable. In the notation of Proposition 2.13 we then have T (x) ¯ = 1, r (t, x) ¯ = R(t), and we get rate dynamics d R(t) = {%(t, 0) − a R(t)} dt + σ dW (t). Thus we see that we have indeed the Hull–White extension of the Vasiˇcek model (1977). Note however that we do not have to choose the benchmark maturity as

7. A Geometric View of Interest Rate Theory

253

x1 = 0. We can in fact choose any fixed maturity, x1 , and then use the corresponding forward rate as benchmark. This will give us the dynamics dr (t, x 1 ) = {%(t, x 1 ) − ar (t, x 1 )} dt + e−ax1 dW (t), and now the entire forward rate curve will be determined by the x 1 -rate according to formula (36). Example 2.15 σ (x) = xe−ax In this example we still have a single driving Wiener process, but the volatility function is now “hump-shaped”. By taking derivatives of σ (x) we immediately see, from Proposition 2.12, that R is given by R = span xe−ax , e−ax , so in this case D = 2, and we have a two-dimensional minimal state space. In order to obtain a realization we compute the transfer function K (s, x), which is given by Lemma 2.9 as K (s, x) = L (x + ·)e−a(x+·) (s). An easy calculation gives us K (s, x) =

sxe−ax + (1 + ax)e−ax e −ax xe−ax = + , (a + s)2 (a + s) (a + s)2

and we now look for a realization of this transfer function (for a fixed x). The obvious thing to do is to use the standard controllable realization (see Brockett (1970)), and we obtain C(x) = xe−ax , (1 + ax)e−ax , ! −2a −a 2 , A = 1 0 ! 1 . B = 0 Since D = 2 and this realization is two-dimensional we have a minimal realization, given by d Z 1 (t) = −2a Z 1 (t)dt − a 2 Z 2 (t)dt + dW (t), d Z 2 (t) = Z 1 (t)dt, r0 (t, x) = xe−ax Z 1 (t) + (1 + ax)e−ax Z 2 (t), r (t, x) = r0 (t, x) + δ(t, x).

254

T. Bj¨ork

We have a double eigenvalue of the system matrix A at λ1 = −a, so if a > 0 the system is asymptotically stable.

2.7 Notes This section is mainly based on Bj¨ork and Gombani (1999). The first paper to appear in this area was to our knowledge the preprint (Musiela (1993)), where the Musiela parameterization and the space R are discussed in some detail. See also the closely related and interesting preprints El Karoui and Lacoste (1993), El Karoui, Geman and Lacoste (1997) and Zabczyk (1992). Because of the linear structure, the theory above is closely connected to (and in a sense inverse to) the theory of affine term structures developed in Duffie and Kan (1996). The standard reference on infinite dimensional SDEs is Da Prato and Zabczyk (1992), where one also can find a presentation of the connections between control theory and infinite dimensional linear stochastic equations.

3 Invariant manifolds In this section we study when a given submanifold of forward rate curves is invariant under the action of a given interest rate model. This problem is of interest from an applied as well as from a theoretical point of view. In particular we will use the results from this section to analyze problems about existence of finite dimensional factor realizations for interest rate models on forward rate form. Invariant manifolds are, however, also of interest in their own right, so we begin by discussing a concrete problem which naturally leads to the invariance concept.

3.1 Parameter recalibration A standard procedure when dealing with concrete interest rate models on a high frequency (say, daily) basis can be described as follows: 1. At time t = 0, use market data to fit (calibrate) the model to the observed bond prices. 2. Use the calibrated model to compute prices of various interest rate derivatives. 3. The following day (t = 1), repeat the procedure in 1 above in order to recalibrate the model, etc. To carry out the calibration in step 1 above, the analyst typically has to produce a forward rate curve {r o (0, x); x ≥ 0} from the observed data. However, since only a finite number of bonds actually trade in the market, the data consist of a discrete set of points, and a need to fit a curve to these points arises. This curve-fitting

7. A Geometric View of Interest Rate Theory

255

may be done in a variety of ways. One way is to use splines, but also a number of parameterized families of smooth forward rate curves have become popular in applications – the most well-known probably being the Nelson-Siegel (see Nelson and Siegel (1987)) family. Once the curve {r o (0, x); x ≥ 0} has been obtained, the parameters of the interest rate model may be calibrated to this. Now, from a purely logical point of view, the recalibration procedure in step 3 above is of course slightly nonsensical: if the interest rate model at hand is an exact picture of reality, then there should be no need to recalibrate. The reason that everyone insists on recalibrating is of course that any model in fact is only an approximate picture of the financial market under consideration, and recalibration allows the incorporation of newly arrived information in the approximation. Even so, the calibration procedure itself ought to take into account that it will be repeated. It appears that the optimal way to do so would involve a combination of time series and cross-section data, as opposed to the purely cross-sectional curve-fitting, where the information contained in previous curves is discarded in each recalibration. . The cross-sectional fitting of a forward curve and the repeated recalibration is thus, in a sense, a pragmatic and somewhat non-theoretical endeavor. Nonetheless, there are some nontrivial theoretical problems to be dealt with in this context, and the problem to be studied in this section concerns the consistency between, on the one hand, the dynamics of a given interest rate model, and, on the other hand, the forward curve family employed. What, then, is meant by consistency in this context? Assume that a given interest rate model M (e.g. the Hull–White model (1990)) in fact is an exact picture of the financial market. Now consider a particular family G of forward rate curves (e.g. the Nelson–Siegel family) and assume that the interest rate model is calibrated using this family. We then say that the pair (M, G) is consistent (or, that M and G are consistent) if all forward curves which may be produced by the interest rate model M are contained within the family G. Otherwise, the pair (M, G) is inconsistent. Thus, if M and G are consistent, then the interest rate model actually produces forward curves which belong to the relevant family. In contrast, if M and G are inconsistent, then the interest rate model will produce forward curves outside the family used in the calibration step, and this will force the analyst to change the model parameters all the time – not because the model is an approximation to reality, but simply because the family does not go well with the model. Put into more operational terms this can be rephrased as follows. • Suppose that you are using a fixed interest rate model M. If you want to do recalibration, then your family G of forward rate curves should be chosen in

256

T. Bj¨ork

such a way as to be consistent with the model M. Note however that the argument can also be run backwards, yielding the following conclusion for empirical work. • Suppose that a particular forward curve family G has been observed to provide a good fit, on a day-to-day basis, in a particular bond market. Then this gives you modeling information about the choice of an interest rate model in the sense that you should try to use/construct an interest rate model which is consistent with the family G. We now have a number of natural problems to study. I Given an interest rate model M and a family of forward curves G, what are necessary and sufficient conditions for consistency? II Take as given a specific family G of forward curves (e.g. the Nelson–Siegel family). Does there exist any interest rate model M which is consistent with G? III Take as given a specific interest rate model M (e.g. the Hull–White model). Does there exist any finitely parameterized family of forward curves G which is consistent with M? In this section we will mainly address problem I above. Problem II has been studied, for special cases, in Filipovi´c (1998a,b), whereas Problem III can be shown (see Proposition 4.6) to be equivalent to the problem of finding a finite dimensional factor realization of the model M and we provide a fairly complete solution in Section 4.

3.2 Invariant manifolds We now move on to give precise mathematical definition of the consistency property discussed above, and this leads us to the concept of an invariant manifold. Definition 3.1 (Invariant manifold) Take as given the forward rate process dynamics (2). Consider also a fixed family (manifold) of forward rate curves G. We say that G is locally invariant under the action of r if, for each point (s, r ) ∈ R+ × G, the condition rs ∈ G implies that rt ∈ G, on a time interval with positive length. If r stays forever on G, we say that G is globally invariant. The purpose of this section is to characterize invariance in terms of local characteristics of G and M, and in this context local invariance is the best one can hope for. In order to save space, local invariance will therefore be referred to as invariance.

7. A Geometric View of Interest Rate Theory

257

To get some intuitive feeling for the invariance concepts one can consider the following two-dimensional deterministic system dy1 dt dy2 dt

= y2 , = −y1 .

For this system it is obvious that the unit circle C = (y1 , y2 ) : y12 + y22 = 1 is globally invariant, i.e. if we start the on C. system on C it will stay forever The ‘upper half’ of the circle, Cu = (y1 , y2 ) : y12 + y22 = 1, y2 > 0 , is on the other hand only locally invariant, since the system will leave Cu at the point (1, 0). This geometric situation is in fact the generic one also for our infinite dimensional stochastic case. The forward rate trajectory will never leave a locally invariant manifold at a point in the relative interior of the manifold. Exit from the manifold can only take place at the relative boundary points. We have no general method for determining whether a locally invariant manifold is also globally invariant or not. Problems of this kind have to be solved separately for each particular case.

3.3 The formalized problem 3.3.1 The Space As our basic space of forward rate curves we will use a weighted Sobolev space, where a generic point will be denoted by r . Definition 3.2 Consider a fixed real number γ > 0. The space Hγ is defined as the space of all differentiable (in the distributional sense) functions r : R+ → R satisfying the norm condition -r -γ < ∞. Here the norm is defined as 2 ∞ ∞ dr 2 −γ x 2 -r -γ = r (x)e d x + (x) e−γ x d x. d x 0 0 Remark 3.3 The variable x is as before interpreted as time to maturity. With the inner product ∞ ∞ dq dr −ax (x) (x) e−γ x d x, (r, q) = r (x)q(x)e d x + d x d x 0 0 the space Hγ becomes a Hilbert space. Because of the exponential weighting function all constant forward rate curves will belong to the space. In the sequel we will suppress the subindex γ , writing H instead of Hγ .

258

T. Bj¨ork

3.3.2 The Forward Curve Manifold We consider as given a mapping G : Z → H,

(38)

where the parameter space Z is an open connected subset of R d , i.e. for each parameter value z ∈ Z ⊆ R d we have a curve G(z) ∈ H. The value of this curve at the point x ∈ R+ will be written as G(z, x), so we see that G can also be viewed as a mapping G : Z × R+ → R.

(39)

The mapping G is thus a formalization of the idea of a finitely parameterized family of forward rate curves, and we now define the forward curve manifold as the set of all forward rate curves produced by this family. Definition 3.4 The forward curve manifold G ⊆ H is defined as G = Im PA G. 3.3.3 The Interest Rate Model We take as given a volatility function σ of the form σ : H × R+ → R m , i.e. σ (r, x) is a functional of the infinite dimensional r -variable, and a function of the real variable x. Denoting the forward rate curve at time t by rt we then have the following forward rate equation.

x ∂ rt (x) + σ (rt , x) σ (rt , u) du dt + σ (rt , x)dWt . (40) drt (x) = ∂x 0 Remark 3.5 For notational simplicity we have assumed that the r -dynamics are time homogeneous. The case when σ is of the form σ (t, r, x) can be treated in exactly the same way. See Bj¨ork and Christensen (1999). We need some regularity assumptions, and the main ones are as follows. See Bj¨ork (1997) for technical details. Assumption 3.6 We assume the following. • The volatility mapping r −→ σ (r ) is smooth. • The mapping z −→ G(z) is a smooth embedding, so in particular the Fr´echet derivative G z (z) is injective for all z ∈ Z. • For every initial point r0 ∈ G, there exists a unique strong solution in H of Equation (40).

7. A Geometric View of Interest Rate Theory

259

3.3.4 The Problem Our main problem is the following. • Suppose that we are given – A volatility σ , specifying an interest rate model M as in (40) – A mapping G, specifying a forward curve manifold G. • Is G then invariant under the action of r ?

3.4 The invariance conditions In order to study the invariance problem we need to introduce some compact notation. Definition 3.7 We define Hσ by

Hσ (r, x) =

x

σ (r, s)ds.

0

Suppressing the x-variable, the Itˆo dynamics for the forward rates are thus given by

∂ drt = (41) rt + σ (rt )Hσ (rt ) dt + σ (rt )dWt ∂x and we write this more compactly as drt = µ0 (rt )dt + σ (rt )dWt ,

(42)

where the drift µ0 is given by the bracket term in (41). To get some intuition we now formally “divide by dt” and obtain dr = µ0 (rt ) + σ (rt )W˙ t , dt

(43)

where the formal time derivative W˙ t is interpreted as an “input signal” chosen by chance. As in Section 2.3 we are thus led to study the associated deterministic control system dr (44) = µ0 (rt ) + σ (rt )u t . dt The intuitive idea is now that G is invariant under (42) if and only if G is invariant under (44) for all choices of the input signal u. It is furthermore geometrically obvious that this happens if and only if the velocity vector µ(r ) + σ (r )u is tangential to G for all points r ∈ G and all choices of u ∈ R m . Since the tangent space of

260

T. Bj¨ork

G at a point G(z) is given by Im G z (z) , where G z denotes the Fr´echet derivative (Jacobian), we are led to conjecture that G is invariant if and only if the condition µ0 (r ) + σ (r )u ∈ Im G z (z) is satisfied for all u ∈ R m . This can also be written µ0 (r ) ∈ Im G z (z) , σ (r ) ∈ Im G z (z) , where the last inclusion is interpreted componentwise for σ . This “result” is, however, not correct due to the fact that the argument above neglects the difference between ordinary calculus, which is used for (44), and Itˆo calculus, which governs (42). In order to bridge this gap we have to rewrite the analysis in terms of Stratonovich integrals instead of Itˆo integrals. Definition 3.8 For given

t semimartingales X and Y , the Stratonovich integral of X with respect to Y , 0 X (s) ◦ dY (s), is defined as t t 1 X s ◦ dYs = X s dYs + .X, Y /t . (45) 2 0 0 The first term on the rhs is the Itˆo integral. In the present case, with only Wiener processes as driving noise, we can define the “quadratic variation process” .X, Y / in (45) by d.X, Y /t = d X t dYt ,

(46)

with the usual “multiplication rules” dW · dt = dt · dt = 0, dW · dW = dt. We now recall the main result and raison d’ˆetre for the Stratonovich integral. Proposition 3.9 (Chain rule) Assume that the function F(t, y) is smooth. Then we have ∂F ∂F (t, Yt )dt + ◦ dYt . (47) d F(t, Yt ) = ∂t ∂y Thus, in the Stratonovich calculus, the Itˆo formula takes the form of the standard chain rule of ordinary calculus. Returning to (42), the Stratonovich dynamics are given by

∂ 1 rt + σ (rt )Hσ (rt ) dt − d.σ (rt ), Wt / drt = ∂x 2 (48) + σ (r t ) ◦ dWt .

7. A Geometric View of Interest Rate Theory

261

In order to compute the Stratonovich correction term above we use the infinite dimensional Itˆo formula (see Da Prato and Zabczyk (1992)) to obtain dσ (rt ) = {· · ·} dt + σ r (rt )σ (rt )dWt ,

(49)

where σ r denotes the Fr´echet derivative of σ w.r.t. the infinite dimensional r variable. From this we immediately obtain d.σ (rt ), Wt / = σ r (rt )σ (rt )dt.

(50)

Remark 3.10 If the Wiener process W is multidimensional, then σ is a vector σ = [σ 1 , . . . , σ m ], and the rhs of (50) should be interpreted as σ r (rt )σ (rt , x) =

m

σ ir (rt )σ i (rt ).

i=1

Thus (48) becomes drt

∂ 1 rt + σ (rt )Hσ (rt ) − σ r (rt )σ (rt ) dt = ∂x 2 + σ (rt ) ◦ dWt

(51)

We now write (51) as drt = µ(rt )dt + σ (rt ) ◦ dWt , where µ(r, x) =

∂ r (x) + σ (rt , x) ∂x

x 0

σ (rt , u)- du −

1 σ r (rt )σ (rt ) (x). 2

(52)

(53)

Given the heuristics above, our main result is not surprising. The formal proof, which is somewhat technical, is left out. See Bj¨ork and Christensen (1999). Theorem 3.11 (Main theorem) The forward curve manifold G is locally invariant for the forward rate process r (t, x) in M if and only if, 1 G x (z) + σ (r ) Hσ (r )- − σ r (r ) σ (r ) ∈ Im[G z (z)] , 2 σ (r ) ∈ Im[G z (z)] ,

(54) (55)

hold for all z ∈ Z with r = G(z). Here, G z and G x denote the Fr´echet derivative of G with respect to z and x, respectively. The condition (55) is interpreted componentwise for σ . Condition (54) is called the consistent drift condition, and (55) is called the consistent volatility condition.

262

T. Bj¨ork

Remark 3.12 It is easily seen that if the family G is invariant under shifts in the x-variable, then we will automatically have the relation G x (z) ∈ Im[G z (z)], so in this case the relation (54) can be replaced by 1 σ (r )Hσ (r )- − σ r (r ) σ (r ) ∈ Im[G z (z)], 2 with r = G(z) as usual.

3.5 Examples The results above are extremely easy to apply in concrete situations. As a test case we consider the Nelson–Siegel (see Nelson and Siegel (1987)) family of forward rate curves. We analyze the consistency of this family with the Ho–Lee and Hull– White interest rate models. It should be emphasized that these examples are chosen only in order to illustrate the general methodology. For more examples and details, see Bj¨ork and Christensen (1999). 3.5.1 The Nelson–Siegel family The Nelson–Siegel (henceforth NS) forward curve manifold G is parameterized by z ∈ R 4 , the curve x −→ G(z, x) as G(z, x) = z 1 + z 2 e−z4 x + z 3 xe−z4 x . For z 4 = 0, the Fr´echet derivatives are easily obtained as G z (z, x) = 1, e−z4 x , xe−z4 x , −(z 2 + z 3 x)xe−z4 x , G x (z, x) = (z 3 − z 2 z 4 − z 3 z 4 x)e−z4 x .

(56)

(57) (58)

In order for the image of this map to be included in Hγ , we need to impose the condition z 4 > −γ /2. In this case, the natural parameter space is thus Z = z ∈ R 4 : z 4 = 0, z 4 > −γ /2 . However, as we shall see below, the results are uniform w.r.t. γ . Note that the mapping G indeed is smooth, and for z 4 = 0, G and G z are also injective. In the degenerate case z 4 = 0, we have G(z, x) = z 1 + z 2 + z 3 x, We return to this case below.

(59)

7. A Geometric View of Interest Rate Theory

263

3.5.2 The Hull–White and Ho–Lee models As our test case, we analyze the Hull and White (1990) (henceforth HW) extension of the Vasiˇcek model. On short rate form the model is given by d R(t) = {"(t) − a R(t)} dt + σ dW (t),

(60)

where a, σ > 0. As is well known, the corresponding forward rate formulation is dr (t, x) = β(t, x)dt + σ e−ax dWt .

(61)

Thus, the volatility function is given by σ (x) = σ e−ax , and the conditions of Theorem 3.11 become σ 2 −ax G x (z, x) + − e−2ax ∈ Im[G z (z, x)], (62) e a (63) σ e−ax ∈ Im[G z (z, x)]. To investigate whether the NS manifold is invariant under HW dynamics, we start with (63) and fix a z-vector. We then look for constants (possibly depending on z) A, B, C, and D, such that for all x ≥ 0 we have σ e −ax = A + Be−z4 x + C xe−z4 x − D(z 2 + z 3 x)xe−z4 x .

(64)

This is possible if and only if z 4 = a, and since (63) must hold for all choices of z ∈ Z we immediately see that HW is inconsistent with the full NS manifold (see also the Notes below). Proposition 3.13 (Nelson–Siegel and Hull–White) The Hull–White model is inconsistent with the NS family. We have thus obtained a negative result for the HW model. The NS manifold is “too small” for HW, in the sense that if the initial forward rate curve is on the manifold, then the HW dynamics will force the term structure off the manifold within an arbitrarily short period of time. For more positive results see Bj¨ork and Christensen (1999). Remark 3.14 It is an easy exercise to see that the minimal manifold which is consistent with HW is given by G(z, x) = z 1 e−ax + z 2 e−2ax . In the same way, one may easily test the consistency between NS and the model obtained by setting a = 0 in (60). This is the continuous time limit of the Ho and Lee model (Ho and Lee (1986)), and is henceforth referred to as HL. Since we have a pedagogical point to make, we give the results on consistency, which are as follows.

264

T. Bj¨ork

Proposition 3.15 (Nelson–Siegel and Ho–Lee) (a) The full NS family is inconsistent with the Ho–Lee model. (b) The degenerate family G(z, x) = z 1 + z 3 x is in fact consistent with Ho–Lee. Remark 3.16 We see that the minimal invariant manifold provides information about the model. From the result above, the HL model is closely tied to the class of affine forward rate curves. Such curves are unrealistic from an economic point of view, implying that the HL model is overly simplistic.

3.6 Notes The section is based on Bj¨ork and Christensen (1999). As we very easily detected above, neither the HW nor the HL model is consistent with the Nelson–Siegel family of forward rate curves. A much more difficult problem is to determine whether any interest rate model is. This is Problem II in Section 3.1 for the NS family, and it has been solved recently (using different techniques) in Filipovi´c (1998a), where it is shown that no nontrivial Wiener driven model is consistent with NS. Thus, for a model to be consistent with Nelson–Siegel, it must be deterministic. In Filipovi´c (1998b) (which is a technical tour de force) this result is extended to a much larger exponential polynomial family than the NS family. In our presentation we have used strong solutions of the infinite dimensional forward rate SDE. This is of course restrictive. The invariance problem for weak solutions has recently been studied in Filipovi´c (1999). An alternative way of studying invariance is by using some version of the Stroock–Varadhan support theorem, and this line of thought is carried out in depth in Zabczyk (1992).

4 Existence of nonlinear realizations We now turn to Problem 2 in Section 1.2, i.e. the problem of when a given forward rate model has a finite dimensional factor realization. For ease of exposition we mostly confine ourselves to a discussion of the case of a single driving Wiener process and to time invariant forward rate dynamics. Multidimensional Wiener processes and time varying systems can be treated similarly, and for completeness we state the results for the multidimensional case. We will use some ideas and concepts from differential geometry, and a general reference here is Warner (1979). The section is based on Bj¨ork and Svensson (1999).

7. A Geometric View of Interest Rate Theory

265

4.1 Setup In order to study the realization problem we need (see Remark 4.4) a very regular space to work in. Definition 4.1 Consider a fixed real number γ > 0. The space Bγ is defined as the space of all infinitely differentiable functions r : R+ → R satisfying the norm condition -r -γ < ∞. Here the norm is defined as 2 ∞ n ∞ d r 2 −n 2 (x) e−γ x d x. -r -γ = n d x 0 n=0 Note that B is not a space of distributions, but a space of functions. As with H we will often suppress the subindex γ . With the obvious inner product B is a pre-Hilbert space, and in Bj¨ork and Svensson (1999) the following result is proved. Proposition 4.2 The space B is a Hilbert space, i.e. it is complete. Furthermore, every function in the space is fact real analytic, and can thus be uniquely extended to a holomorphic function in the entire complex plane. We now take as given a volatility σ : B → B and consider the induced forward rate model (on Stratonovich form) dr t = µ(rt )dt + σ (rt ) ◦ dWt ,

(65)

where as before (see Section 3.4). ∂ 1 r + σ (r )Hσ (r )- − σ r (r )σ (r ). ∂x 2 We need some regularity assumptions. µ(r ) =

(66)

Assumption 4.3 We assume that σ is chosen such that the following hold. • The mapping σ is smooth. • The mapping 1 r −→ σ (r )Hσ (r )- − σ r (r )σ (r ) 2 is a smooth map from B to B. Remark 4.4 The reason for our choice of B as the underlying space is that the linear operator F = d/d x is bounded in this space. Together with the assumptions above, this implies that both µ and σ are smooth vector fields on B, thus ensuring

266

T. Bj¨ork

the existence of a strong local solution to the forward rate equation for every initial point r o ∈ B.

4.2 The geometric problem Given a specification of the volatility mapping σ , and an initial forward rate curve r o we now investigate when (and how) the corresponding forward rate process possesses a finite dimensional realization. We are thus looking for smooth d-dimensional vector fields a and b, an initial point z 0 ∈ R d , and a mapping G : R d → B such that r , locally in time, has the representation d Zt

= a(Z t )dt + b(Z t )dWt , Z 0 = z 0

r (t, x) = G(Z t , x).

(67) (68)

Remark 4.5 Let us clarify some points. Firstly, note that in principle it may well happen that, given a specification of σ , the r -model has a finite dimensional realization given a particular initial forward rate curve r o , while being infinite dimensional for all other initial forward rate curves in a neighborhood of r o . We say that such a model is a non-generic or accidental finite dimensional model. If, on the other hand, r has a finite dimensional realization for all initial points in a neighborhood of r o , then we say that the model is a generically finite dimensional model. In this text we are solely concerned with the generic problem. Secondly, let us emphasize that we are looking for local (in time) realizations. We can now connect the realization problem to our studies of invariant manifolds. Proposition 4.6 The forward rate process possesses a finite dimensional realization if and only if there exists an invariant finite dimensional submanifold G with r o ∈ G. Proof See Bj¨ork and Christensen (1999) for the full proof. The intuitive argument runs as follows. Suppose that there exists a finite dimensional invariant manifold G with r o ∈ G. Then G has a local coordinate system, and we may define the Z process as the local coordinate process for the r -process. On the other hand it is clear that if r has a finite dimensional realization as in (67)–(68), then every forward rate curve that will be produced by the model is of the form x −→ G(z, x) for some choice of z. Thus there exists a finite dimensional invariant submanifold G containing the initial forward rate curve r o , namely G = Im G. Using Theorem 3.11 we immediately obtain the following geometric characterization of the existence of a finite realization.

7. A Geometric View of Interest Rate Theory

267

Corollary 4.7 The forward rate process possesses a finite dimensional realization if and only if there exists a finite dimensional manifold G containing r o , such that, for each r ∈ G, the following conditions hold: µ(r ) ∈ TG (r ), σ (r ) ∈ TG (r ). Here TG (r ) denotes the tangent space to G at the point r , and the vector fields µ and σ are as above. 4.3 The main result Given the volatility vector field σ , and hence also the field µ, we now are faced with the problem of determining whether there exists a finite dimensional manifold G with the property that µ and σ are tangential to G at each point of G. In the case when the underlying space is finite dimensional, this is a standard problem in differential geometry, and we will now give the heuristics. To get some intuition we start with a simpler problem and therefore consider the space B (or any other Hilbert space), and a smooth vector field f on the space. For each fixed point r o ∈ B we now ask whether there exists a finite dimensional manifold G with r o ∈ G such that f is tangential to G at every point. The answer to this question is yes, and the manifold can in fact be chosen to be one-dimensional. To see this, consider the infinite dimensional ODE drt = f (rt ), dt r0 = r o .

(69) (70)

If rt is the solution, at time t, of this ODE, we use the notation rt = e f t r o . ft : t ∈ R , and we note that the set We have thus defined a group of operators e ft o e r : t ∈ R ⊆ B is nothing else than the integral curve of the vector field f , passing through r o . If we define G as this integral curve, then our problem is solved, since f will be tangential to G by construction. Let us now take two vector fields f 1 and f 2 as given, where the reader informally can think of f 1 as σ and f 2 as µ. We also fix an initial point r o ∈ B and the question is if there exists a finite dimensional manifold G, containing r o , with the property that f 1 and f 2 are both tangential to G at each point of G. We call such a manifold a tangential manifold for the vector fields. At a first glance it would seem that there always exists a tangential manifold, and that it can even be chosen to be two-dimensional. The geometric idea is that we start at r o and let f 1 generate the

268

T. Bj¨ork

integral curve e f1 s r o : s ≥ 0 . For each point e f1 s r o on this curve we now let f 2 generate the integral curve starting at that point. This gives us the object e f2 t e f1 s r o and thus it seems that we sweep out a two-dimensional surface G in B. This is our obvious candidate for a tangential manifold. In the general case this idea will, however, not work, and the basic problem is as follows. In the construction above we started with the integral curve generated by f 1 and then applied f 2 , and there is of course no guarantee that we will obtain the same surface if we start with f2 and then apply f 1 . We thus have some sort of commutativity problem, and the key concept is the Lie bracket. Definition 4.8 Given smooth vector fields f and g on B, the Lie bracket [ f, g] is a new vector field defined by [ f, g] (r ) = f (r )g(r ) − g (r ) f (r ).

(71)

The Lie bracket measures the lack of commutativity on the infinitesimal scale in our geometric program above, and for the procedure to work we need a condition which says that the lack of commutativity is “small”. It turns out that the relevant condition is that the Lie bracket should be in the linear hull of the vector fields. Definition 4.9 Let f 1 , . . . , f n be smooth independent vector fields on some space X . Such a system is called a distribution, and the distribution is said to be involutive if f i , f j (x) ∈ span { f 1 (x), . . . , f n (x)} , ∀i, j, where the span is the linear hull over the real numbers. We now have the following basic result, which extends a classic result from finite dimensional differential geometry (see Warner (1979)). Theorem 4.10 (Frobenius) Let f 1 , . . . , f k be independent smooth vector fields in B and consider a fixed point r o ∈ B. Then the following statements are equivalent. • For each point r in a neighborhood of r o , there exists a k-dimensional tangential manifold passing through r . • The system f 1 , . . . , f k of vector fields is (locally) involutive. Proof See Bj¨ork and Svensson (1999), which provides a self contained proof of the Frobenius theorem in Banach space. Let us now go back to our interest rate model. We are thus given the vector fields µ, σ , and an initial point r o , and the problem is whether there exists a finite dimensional tangential manifold containing r o . Using the infinite dimensional

7. A Geometric View of Interest Rate Theory

269

Frobenius theorem, this situation is now easily analyzed. If {µ, σ } is involutive then there exists a two-dimensional tangential manifold. If {µ, σ } is not involutive, this means that the Lie bracket [µ, σ ] is not in the linear span of µ and σ , so then we consider the system {µ, σ , [µ, σ ]}. If this system is involutive there exists a three-dimensional tangential manifold. If it is not involutive at least one of the brackets [µ, [µ, σ ]], [σ , [µ, σ ]] is not in the span of {µ, σ , [µ, σ ]}, and we then adjoin this (these) bracket(s). We continue in this way, forming brackets of brackets, and adjoining these to the linear hull of the previously obtained vector fields, until the point when the system of vector fields thus obtained actually is closed under the Lie bracket operation. Definition 4.11 Take the vector fields f 1 , . . . , f k as given. The Lie algebra generated by f 1 , . . . , f k is the smallest linear space (over R) of vector fields which contains f 1 , . . . , f k and is closed under the Lie bracket. This Lie algebra is denoted by L = { f 1 , . . . , f k }LA The dimension of L is defined, for each point r ∈ B, as dim [L(r )] = dim span { f1 (r ), . . . , f k (r )} . Putting all these results together, we have the following main result on finite dimensional realizations. Theorem 4.12 (Main result) Take the volatility mapping σ = (σ 1 , . . . , σ m ) as given. Then the forward rate model generated by σ generically admits a finite dimensional realization if and only if dim {µ, σ 1 , . . . , σ m }LA < ∞ in a neighborhood of r o . The result above thus provides a general solution to Problem II from Section 1.2. For any given specification of forward rate volatilities, the Lie algebra can in principle be computed, and the dimension can be checked. Note, however, that the theorem is a pure existence result. If, for example, the Lie algebra has dimension five, then we know that there exists a five-dimensional realization, but the theorem does not directly tell us how to construct a concrete realization. This is the subject of ongoing research. Note also that realizations are not unique, since any diffeomorphic mapping of the factor space R d onto itself will give a new equivalent realization. When computing the Lie algebra generated by µ and σ , the following observations are often useful.

270

T. Bj¨ork

Lemma 4.13 Take the vector fields f 1 , . . . , f k as given. The Lie algebra L = { f 1 , . . . , f k }LA remains unchanged under the following operations. • The vector field f i (r ) may be replaced by α(r ) f i (r ), where α is any smooth nonzero scalar field. • The vector field f i (r ) may be replaced by f i (r ) + α j (r ) f j (r ), j=i

where α j is any smooth scalar field. Proof The first point is geometrically obvious, since multiplication by a scalar field will only change the length of the vector field f i , and not its direction, and thus not the tangential manifold. Formally it follows from the “Leibnitz rule” [ f, αg] = α [ f, g] − (α f )g. The second point follows from the bilinear property of the Lie bracket together with the fact that [ f, f ] = 0.

4.4 Applications In this section we give some simple applications of the theory developed above. For more examples and results, see Bj¨ork and Svensson (1999). 4.4.1 Constant Volatility We start with the simplest case, which is when the volatility σ (r, x) is a constant vector in B. We are thus back in the framework of Section 2, and we assume for simplicity that we have only one driving Wiener process. Then we have no Stratonovich correction term and the vector fields are given by x σ (s)ds, µ(r, x) = Fr (x) + σ (x) σ (r, x) = σ (x).

0

where as before F = ∂∂x . The Fr´echet derivatives are trivial in this case. Since F is linear (and bounded in our space), and σ is constant as a function of r , we obtain µr

= F,

σ r

= 0.

Thus the Lie bracket [µ, σ ] is given by [µ, σ ] = Fσ ,

7. A Geometric View of Interest Rate Theory

271

and in the same way we have [µ, [µ, σ ]] = F2 σ . Continuing in the same manner it is easily seen that the relevant Lie algebra L is given by L = {µ, σ }LA = span µ, σ , Fσ , F2 σ , . . . = span µ, Fn σ ; n = 0, 1, 2, . . . . It is thus clear that L is finite dimensional (at each point r ) if and only if the function space span Fn σ ; n = 0, 1, 2, . . . is finite dimensional. We have thus obtained our old condition from Proposition 2.12 and we have the following result which extends Proposition 2.4 by in principle allowing the realization to be nonlinear. Proposition 4.14 Under the above assumptions, there exists a finite dimensional realization if and only if σ is a quasi-exponential function. 4.4.2 Constant Direction Volatility We go on to study the most natural extension of the deterministic volatility case (still in the case of a scalar Wiener process), namely the case when the volatility is of the form σ (r, x) = ϕ(r )λ(x).

(72)

In this case the individual vector field σ has the constant direction λ ∈ H, but is of varying length, determined by ϕ, where ϕ is allowed to be any smooth functional of the entire forward rate curve. In order to avoid trivialities we make the following assumption. Assumption 4.15 We assume that ϕ(r ) = 0 for all r ∈ H. After a simple calculation the drift vector µ turns out to be 1 µ(r ) = Fr + ϕ 2 (r )D − ϕ (r )[λ]ϕ(r )λ, 2

(73)

where ϕ (r )[λ] denotes the Fr´echet derivative ϕ (r ) acting on the vector λ, and where the constant vector D ∈ H is given by x D(x) = λ(x) λ(s)ds. 0

272

T. Bj¨ork

We now want to know under what conditions on ϕ and λ we have a finite dimensional realization, i.e. when the Lie algebra generated by 1 µ(r ) = Fr + ϕ 2 (r )D − ϕ (r )[λ]ϕ(r )λ, 2 σ (r ) = ϕ(r )λ, is finite dimensional. Under Assumption 4.15 we can use Lemma 4.13, to see that the Lie algebra is in fact generated by the simpler system of vector fields f 0 (r ) = Fr + "(r )D, f 1 (r ) = λ, where we have used the notation "(r ) = ϕ 2 (r ). Since the field f 1 is constant, it has zero Fr´echet derivative. Thus the first Lie bracket is easily computed as [ f 0 , f 1 ] (r ) = Fλ + " (r )[λ]D. The next bracket to compute is [[ f 0 , f 1 ] , f 1 ] which is given by [[ f 0 , f 1 ] , f 1 ] = " (r )[λ; λ]D. Note that " (r )[λ; λ] is the second order Fr´echet derivative of " operating on the vector pair [λ; λ]. This pair is to be distinguished (notice the semicolon) from the Lie bracket [λ, λ] (with a comma), which if course would be equal to zero. We now make a further assumption. Assumption 4.16 We assume that " (r )[λ; λ] = 0 for all r ∈ H. Given this assumption we may again use Lemma 4.13 to see that the Lie algebra is generated by the following vector fields f 0 (r ) = Fr, f 1 (r ) = λ, f 3 (r ) = Fλ, f 4 (r ) = D. Of these vector fields, all but f 0 are constant, so all brackets are easy. After elementary calculations we see that in fact {µ, σ }LA = span Fr, Fn λ, Fn D; n = 0, 1, . . . .

7. A Geometric View of Interest Rate Theory

273

From this expression it follows immediately that a necessary condition for the Lie algebra to be finite dimensional is that the vector space spanned by {Fn λ; n ≥ 0} is finite dimensional. This occurs if and only if λ is quasi-exponential (see Remark 2.5). If, on the other hand, λ is quasi-exponential, then we know from Lemma 2.6, that D is also quasi-exponential, since it is the integral of the QE function λ multiplied by the QE function λ. Thus the space {Fn D; n = 0, 1, . . .} is also finite dimensional, and we have proved the following result. Proposition 4.17 Under Assumptions 4.15 and 4.16, the interest rate model with volatility given by σ (r, x) = ϕ(r )λ(x) has a finite dimensional realization if and only if λ is a quasi-exponential function. The scalar field ϕ is allowed to be any smooth field. 4.4.3 When is the Short Rate a Markov Process? One of the classical problems concerning the HJM approach to interest rate modeling is that of determining when a given forward rate model is realized by a short rate model, i.e. when the short rate is Markovian. We now briefly indicate how the theory developed above can be used in order to analyze this question. For the full theory see Bj¨ork and Svensson (1999). Using the results above, we immediately have the following general necessary condition. Proposition 4.18 The forward rate model generated by σ is a generic short rate model, i.e. the short rate is generically a Markov process, only if dim {µ, σ }LA ≤ 2.

(74)

Proof If the model is really a short rate model, then bond prices are given as p(t, x) = F(t, Rt , x) where F solves the term structure PDE. Thus bond prices, and forward rates are generated by a two-dimensional factor model with time t and the short rate R as the state variables. Remark 4.19 The most natural case is dim {µ, σ }LA = 2. It is an open problem whether there exists a non-deterministic generic short rate model with dim {µ, σ }LA = 1. Note that condition (74) is only a necessary condition for the existence of a short rate realization. It guarantees that there exists a two-dimensional realization, but the question remains whether the realization can be chosen in such a way that the short rate and running time are the state variables. This question is completely resolved by the following central result.

274

T. Bj¨ork

Theorem 4.20 Assume that the model is not deterministic, and take as given a time invariant volatility σ (r, x). Then there exists a short rate realization if and only if the vector fields [µ, σ ] and σ are parallel, i.e. if and only if there exists a scalar field α(r ) such that the following relation holds (locally) for all r . [µ, σ ] (r ) = α(r )σ (r ).

(75)

Proof See Bj¨ork and Svensson (1999). It turns out that the class of generic short rate models is very small indeed. We have, in fact, the following result, which was first proved in Jeffrey (1995) (using techniques different from those above). See Bj¨ork and Svensson (1999) for a proof based on Theorem 4.20. Theorem 4.21 Consider an HJM model with one driving Wiener process and a volatility structure of the form σ (r, x) = g(R, x). where R = r (0) is the short rate. Then the model is a generic short rate model if and only if g has one of the following forms. • There exists a constant c such that g(R, x) ≡ c. • There exist constants a and c such that. g(R, x) = ce−ax . • There exist constants a and b, and a function α(x), where α satisfies a certain Riccati equation, such that √ g(R, x) = α(x) a R + b. We immediately recognize these cases as the Ho–Lee model, the Hull–White extended Vasiˇcek model, and the Hull–White extended Cox–Ingersoll–Ross model (Cox, Ingersoll and Ross (1985)). Thus, in this sense the only generic short rate models are the affine ones, and the moral of this, perhaps somewhat surprising, result is that most short rate models considered in the literature are not generic but “accidental”. To understand the geometric picture one can think of the following program. 1. Choose an arbitrary short rate model, say of the form d Rt = a(Rt )dt + b(Rt )dWt with a fixed initial point R0 .

7. A Geometric View of Interest Rate Theory

275

2. Solve the associated PDE in order to compute bond prices. This will also produce: • An initial forward rate curve rˆ o (x). • Forward rate volatilities of the form g(R, x). 3. Forget about the underlying short rate model, and take the forward rate volatility structure g(R, x) as given in the forward rate equation. 4. Initiate the forward rate equation with an arbitrary initial forward rate curve r o (x). The question is now whether the thus constructed forward rate model will produce a Markovian short rate process. Obviously, if you choose the initial forward rate curve r o as r o = rˆ o , then you are back where you started, and everything is OK. If, however, you choose another initial forward rate curve rather than rˆ o , say the observed forward rate curve of today, then it is no longer clear that the short rate will be Markovian. What the theorem above says is that only the models listed above will produce a Markovian short rate model for all initial points in a neighborhood of rˆ o . If you take another model (like, say, the Dothan model) then a generic choice of the initial forward rate curve will produce a short rate process which is not Markovian.

4.5 Notes The section is based on Bj¨ork and Svensson (1999) where full proofs and further results can be found, and where also the time varying case is considered. In our study of the constant direction model above, ϕ was allowed to be any smooth functional of the entire forward rate curve. The simpler special case when ϕ is a point evaluation of the short rate, i.e. of the form ϕ(r ) = h(r (0)) has been studied in Bhar and Chiarella (1997), Inui and Kijima (1998) and Ritchken and Sankarasubramanian (1995). All these cases falls within our present framework and the results are included as special cases of the general theory above. A different case, treated in Chiarella and Kwon (1998), occurs when σ is a finite point evaluation, i.e. when σ (t, r ) = h(t, r (x 1 ), . . . r (xk )) for fixed benchmark maturities x 1 , . . . , xk . In Chiarella and Kwon (1998) it is studied when the corresponding finite set of benchmark forward rates is Markovian. A classic paper on Markovian short rates is Carverhill (1994), where a deterministic volatility of the form σ (t, x) is considered. Theorem 4.21 was first stated and proved in Jeffrey (1995). See Eberlein and Raible (1999) for an example with a driving L´evy process. The geometric ideas presented above and in Bj¨ork and Svensson (1999) are intimately connected to controllability problems in systems theory, where they

276

T. Bj¨ork

have been used extensively (see Isidori (1989)). They have also been used in filtering theory, where the problem is to find a finite dimensional realization of the unnormalized conditional density process, the evolution of which is given by the Zakai equation. See Brockett (1981) for an overview of these areas.

References Bhar, R. and Chiarella, C. (1997), Transformation of Heath–Jarrow–Morton models to markovian systems. European Journal of Finance 3, 1, 1–26. Bj¨ork, T. (1997), Interest Rate Theory. In W. Runggaldier (ed.), Financial Mathematics. Springer Lecture Notes in Mathematics, Vol. 1656. Springer-Verlag, Berlin. Bj¨ork, T. and Christensen, B.J. (1999), Interest rate dynamics and consistent forward rate curves. Mathematical Finance 9, 4, 323–48. Bj¨ork, T. and Gombani, A. (1999), Minimal realization of interest rate models. Finance and Stochastics 3, 4, 413–32. Bj¨ork, T. and Svensson, L. (1999), On the existence of finite dimensional nonlinear realizations of interest rate models. Forthcoming in Mathematical Finance. Brace, A. and Musiela, M. (1994), A multi factor Gauss Markov implementation of Heath Jarrow and Morton. Mathematical Finance 4, 3, 563–76. Brockett, R.W. (1970), Finite Dimensional Linear Systems. Wiley, New York. Brockett, R.W. (1981), Nonlinear systems and nonlinear estimation theory. In Stochastic systems: The Mathematics of Filtering and Identification and Applications (eds. Hazewinkel, M and Willems, J.C.) Reidel, Dordrecht. Carverhill, A. (1994), When is the spot rate Markovian? Mathematical Finance, 4, 305–12. Chiarella, C and Kwon, K. (1998), Forward rate dependent Markovian transformations of the Heath–Jarrow–Morton term structure model. Working paper. School of Finance and Economics, University of Technology, Sydney. Cox, J., Ingersoll, J. and Ross, S. (1985), A theory of the term structure of interest rates. Econometrica 53, 385–408. Da Prato, G. and Zabczyk, J. (1992), Stochastic Equations in Infinite Dimensions. Cambridge University Press, Cambridge. Duffie, D. and Kan, R. (1996), A yield factor model of interest rates. Mathematical Finance, 6, 379–406. Eberlein, E. and Raible, S. (1999), Term structure models driven by general L´evy processes. Mathematical Finance 9, 31–53. El Karoui, N. and Lacoste, V (1993), Multifactor models of the term structure of interest rates. Preprint. El Karoui, N., Geman, H. and Lacoste, V (1997), On the role of state variables in interest rate models. Preprint Filipovi´c, D. (1998a): A note on the Nelson–Siegel family. Mathematical Finance 9, 4, 349–59. Filipovi´c, D. (1998b): Exponential–polynomial families and the term structure of interest rates. To appear in Bernoulli. Filipovi´c, D. (1999), Invariant manifolds for weak solutions of stochastic equations. To appear in Probability Theory and Related Fields. Heath, D., Jarrow, R. and Morton, A. (1992), Bond pricing and the term structure of interest rates. Econometrica 60 1, 77–106.

7. A Geometric View of Interest Rate Theory

277

Ho, T. and Lee, S. (1986), Term structure movements and pricing interest rate contingent claims. Journal of Finance 41, 1011–29. Hull, J. and White, A. (1990), Pricing interest-rate-derivative securities. The Review of Financial Studies 3, 573–92. Inui, K. and Kijima, M. (1998), A markovian framework in multi-factor Heath–Jarrow–Morton models. JFQA 333 3, 423–40. Isidori, A. (1989), Nonlinear Control Systems. Springer-Verlag, Berlin. Jeffrey, A. (1995), Single factor Heath–Jarrow–Morton term structure models based on Markovian spot interest rates. JFQA 30 4, 619–42. Musiela, M. (1993), Stochastic PDEs and term structure models. Preprint. Musiela, M. and Rutkowski, M. (1997), Martingale Methods in Financial Modeling. Springer-Verlag, Berlin, Heidelberg, New York. Nelson, C. and Siegel, A. (1987), Parsimonious modelling of yield curves. Journal of Business, 60, 473–89. Ritchken, P. and Sankarasubramanian, L. (1995), Volatility structures of forward rates and the dynamics of the term structure. mathematical Finance, 5, 1, 55–72. Vasi˘cek, O. (1977), An equilibrium characterization of the term structure. Journal of Financial Economics 5, 177–88. Warner, F.W. (1979), Foundations of Differentiable Manifolds and Lie Groups. Scott, Foresman, Hill. Zabczyk, J. (1992), Stochastic invariance and conistency of financial models. Preprint. Scuola Normale Superiore, Pisa.

8 Towards a Central Interest Rate Model Alan Brace, Tim Dun and Geoff Barton

1 Introduction In recent years, the appearance of a new class of term structure of interest rate models has attracted the interest of practitioners. These so-called Market Models provide both an arbitrage-free pricing framework and pricing formulae that conform to the current (and accepted) market practice. This class of model can effectively be split into two types: those that model forward Libor rates, and those that model forward swap rates. The Libor rate models, such as those introduced in Miltersen et al. (1997), Brace et al. (1997) and Musiela and Rutkowski (1997a,b), allow caps to be priced in a manner consistent with market practice, while the swap rate models, such as the one proposed by Jamshidian (1997), do the same for swaptions. However, these two approaches are fundamentally incompatible because Libor rates and swap rates cannot both be lognormal in an arbitrage-free framework. The formulae currently in use in the market are based on extensions of the wellknown Black–Scholes option formula, and are, in fact, known as the Black cap and swaption formulae. In the case of swaptions, the swap rate replaces the stock price as being the market observable parameter assumed to follow lognormal dynamics. Other concepts that are related to (and easily calculated using) the Black–Scholes option formula can also be extended to the case of swaptions, such as the option sensitivities or Greeks. These give an indication as to the likely magnitude and direction of the change in option price under changes in the swap rate value and/or volatility. The Black formulae, however, are incapable of producing arbitrage-free prices for exotics, nor are they of much use as a ‘central’ interest rate model to do bankwide risk management. These shortfalls constitute the original motivation for the development of term structure models. So how do the two types of Market Model mentioned above perform in these areas? 278

8. Towards a Central Interest Rate Model

279

When pricing exotics, the natural tendency is to choose the most appropriate model for the task, hence Libor models for Libor based exotics, such as barrier caps triggered by Libor, and swap rate models for swap rate based exotics, such as barrier swaptions triggered by the swap rate. The case of cross-market exotics, however, is not so simple – how does one treat barrier swaptions triggered by Libor, and how does one calibrate simultaneously to both cap and swaption markets? In the authors’ opinion, the Libor model is the unifying model – the Central Interest Rate Model – capable of encompassing the global properties of the swap rate model and tackling the problems related above. This is primarily because it is the most tractable mathematically, with Libor rates being lognormal under their own measures, without the restriction of only certain families of swap rates being lognormal. The model also prices swaptions and swap rate exotics, and, as we intend to argue in this paper, in practice it prices swaptions in a manner close to that of the market – and by extension – to the forward swap rate model. This indicates a closeness between the two types of Market Model.1 We propose in this study, therefore, to examine the Libor model and its ability to price and hedge pure swap market products in comparison to the Black swaption formula, under arbitrary yield and volatility specifications, with the aim of revealing the closeness of the two approaches. Our methodology is as follows. First, in Section 2, the notation and equations involved in swaption pricing within the Libor model are introduced. The Black swaption formula is also presented, along with the equations necessary to calculate the swaption Greeks and hedge swaptions. In Section 3, the actual distributional properties of the swap rate within the Libor model are examined analytically, to see if it cannot be approximately modelled by a lognormal process. An expression is then derived for the volatility of this swap rate allowing the approximate pricing of swaptions inside the Libor model using a Black type formula. In Section 4, approximation techniques are applied to derive equations inside the Libor model for swaption Greeks with respect to the swap rate. Here, only approximate relations at best may be expected, since in the Libor model, the swap rate is a weighted sum of Libor rates, and not a single quantity as implied by the Black formula. These Greeks will, however, provide us with another mechanism for comparing the swaption modelling capabilities of the Libor model. Simulation techniques are then used to test the approximations from Sections 3 and 4 on a range of swaptions for two quite different volatility structures, with the results presented in Section 5. Tests are carried out to determine if the swaption Greeks derived are meaningful by undertaking a delta-hedging simulation and seeing if Libor model swaptions can be 1 This closeness was first alluded to in the observation in Brace et al. (1997) that the Libor model swaption

formula essentially reduces to the Black formula when yield and volatility are flat. Other authors to examine this behaviour include Jamshidian (1997) and Rebonato (1999).

280

A. Brace, T. Dun and G. Barton

successfully hedged within the Libor model framework using Black-style hedging techniques. The results from these tests are also presented in Section 5. Finally, Section 6 states our conclusions on the work done, while the appendices contain additional results, both numerical and mathematical, for the interested reader.

2 Model preliminaries In this section, we introduce the fundamental equations behind the lognormal Libor model, together with swap and swaption pricing within this model. The equivalent market pricing equations are then presented, and option sensitivities (or Greeks) defined. The section ends with a description of a method for translating the Greeks into actual hedges. Note that all the definitions, results and formulae in this section hold for both single and multi-factor models.

2.1 Lognormal Libor model We consider the discrete tenor version of the lognormal forward Libor model, as described in Musiela and Rutkowski (1997a,b), and Jamshidian (1997), as opposed to the continuous tenor model in Brace et al. (1997). We start with an equi-spaced tenor structure defined by T j = T0 + jδ for j = 1, . . . , n where δ is a constant typically of value three or six months. Time t values of zero coupon bonds expiring on the tenor dates are expressed as P(t, T j ), while the forward time T price for a zero coupon bond maturing at T j ≥ T is FT (t, T j ) =

P(t, T j ) . P(t, T )

The forward Libor rate K (t, T j ), expressing the simple forward interest rate between tenor dates T j and T j+1 , is related to the zero coupon bonds by P(t, T j ) 1 −1 . K (t, T j ) = δ P(t, T j+1 ) We assume that we are equipped with a complete filtered probability space (, F, P) satisfying the ‘usual conditions’ (see Chapter 14 in Musiela and Rutkowski (1997a)). The dynamics of the forward Libor processes are then described by the stochastic differential equation d K (t, T j−1 ) = K (t, T j−1 )γ (t, T j−1 ) · dWT j (t)

(1)

8. Towards a Central Interest Rate Model

281

where γ (t, T j−1 ) is the forward Libor volatility function, and WT j represents Brownian motion under the P-equivalent forward measure PT j . Adjacent forward measures are related by d WT j (t) = dWT j−1 (t) +

δ K (t, T j−1 ) γ (t, T j−1 )dt. 1 + δ K (t, T j−1 )

(2)

Consider now a forward payer swap, paid in arrears, with n equal rolls starting at time T0 . In terms of zero coupon bonds, Libor rates and a strike value κ, the time t value of the swap Pswap(t) can be written as Pswap(t) = Pswap(t, T0 , n) = δ

n

P(t, T j ) K (t, T j−1 ) − κ .

(3)

j=1

The swap rate ω(t) is that unique value of the strike which gives the swap contract zero value, and is given by n n j=1 P(t, T j )K (t, T j−1 ) j=1 FT0 (t, T j )K (t, T j−1 ) n n = . ω(t) = ω(t, T0 , n) = j=1 P(t, T j ) j=1 FT0 (t, T j ) (4) A swaption is formally defined as an option maturing at time T0 , on an underlying swap with strike κ. If the swap rate is greater than the strike at option maturity, then the swaption pays the difference between the two rates. The swaption price can, therefore, be expressed as Pswpn(t) = δ

n

P(t, T j )ET j

K (T, T j−1 ) − κ I(A) Ft

(5)

j=1

where A = {Swap(T ) ≥ 0} is the event that the swap ends up in-the-money. This expression does not allow an analytic solution, however a good approximation can be found following the approach in Brace et al. (1997) or Brace (1996). This approximation was originally derived for the continuous tenor version of the model, however it is equally valid in the discrete tenor model as no dates outside of the discrete tenor structure appear in the formulae. Define the n-dimensional random vector T0 de f γ (s, T j−1 ) · dWT j (s) X = (X j ) = t

and approximate it by a Gaussian random vector by using a deterministic approximation (here a Wiener chaos expansion of order 0) to the stochastic drift term in (2). The mean vector µ and covariance matrix λ of our approximation under the PT0 -measure are then given by X

∼ N(µ, λ),

282

A. Brace, T. Dun and G. Barton

µ = (µ j ) =

# j

i=1 T0

λ = (λi j ) =

$ δ K (t, Ti−1 ) λi j , 1 + δ K (t, Ti−1 )

γ (s, Ti−1 ) · γ (s, T j−1 )ds ,

(6)

t

where N(·) represents the multi-dimensional Gaussian cumulative distribution function. We find in practice that the symmetric matrix λ (which we will term the swaption covariance matrix) is often of rank one, meaning that it can be expressed as the cross product of a vector with itself, as in λ = × T . Such a decomposition can be easily found through an eigenvector/eigenvalue analysis of the matrix. Using this rank one approximation , we find the value of s satisfying the relation n j=1

K (t, T j−1 ) exp( j (s + d j ) − 12 2j ) − κ =0 1j 1 2 ) exp( (s + d ) − ) 1 + δ K (t, T j−1 j j i=1 2 j

(7)

with dj =

j i=1

δ K (t, Ti−1 ) i , 1 + δ K (t, Ti−1 )

and the approximate swaption price is then given by Pswpn(t) ≈ δ

n

P(t, T j ) K (t, T j−1 )N(h j ) − κN(h j − j )

(8)

j=1

where h j = −(s + d j − j ).

(9)

Equation (8) provides an accurate approximation as long as the assumption holds that the covariance matrix λ is of rank one. This assumption and its implications are discussed in more detail in Sections 4.1, 5.3 and 5.5.

2.2 Market swaption formula In the Market (or Black) swaption pricing formula, swap rates are implicitly assumed lognormal under a single measure Pm . For a swap of n rolls, maturing at time T0 , this implies the following relation between the forward swap rate ω(t) = ω(t, T0 , n) and its associated volatility σ (t) = σ (t, T0 , n): dω(t) = ω(t)σ (t) · dW (t),

8. Towards a Central Interest Rate Model

283

where W (t) is Brownian motion under Pm . In terms of ω(t), the present values of a payer swap and corresponding payer swaption are Pswap(t) = Pswap(t, T0 , n) = δ

n

P(t, T j ) (ω(t) − κ),

j=1

Pswpn(t) = Pswpn(t, T0 , n) = δ

n

P(t, T j )E (ω (T0 ) − κ)+ Ft

j=1

= δ

n

P(t, T j )B(t),

(10)

j=1

where B(t) is Black’s call formula B(t) = ω(t)N (h) − κ N h − ζ ,

(11)

in this case with + 1ζ ln ω(t) κ h = √ 2 , ζ T0 |σ (s, T0 , n)|2 ds. ζ =

(12)

t

We denote the term ζ as the swaption zeta, representing a volatility term which also contains information on the time to maturity of the option. We will use it below to define a version of the option vega. For the sake of convenience, we denote the sum n j=1 δ P(t, T j ) as the present value of a basis point, or PVBP. In other references this sum has been given various other names, including the coupon process, the level, or even the annuity price. The definition of sensitivities (or Greeks) for swaptions differs slightly from standard Black–Scholes type options due to the presence of the PVBP term and the fact that the swap rate is a forward rather than a spot value. We define, therefore, our Greeks in terms of forward values into the swaption discounted by the PVBP – this being a sensible definition in terms of hedging – as will be discussed in Section 2.3. This reduces the expressions for the Greeks to partial derivatives of the Black term B(t), as in # $ ∂ Pswpn(t) ∂B n Swaption delta = = N (h), (13) = ∂ω δ j=1 P(t, T j ) ∂ω # $ Pswpn(t) 1 ∂ 2B ∂2 = √ N (h), (14) = Swaption gamma = n 2 2 ∂ω δ j=1 P(t, T j ) ∂ω ω ζ

284

A. Brace, T. Dun and G. Barton

and Swaption vega

∂ = ∂ζ

#

Pswpn(t) n δ j=1 P(t, T j )

$ =

∂B ω = √ N (h), ∂ζ 2 ζ

(15)

where, as indicated above, we define our vega term slightly differently from the traditional way in that it is the derivative with respect to the swaption zeta, rather than an annualised volatility value as in Black–Scholes. This is done simply to ease computation later. Note that N (·) represents the Gaussian density function. Note also that our gamma and vega are connected by the relation 1 2 (16) ω , 2 and we would expect our approximate formulae for and in the lognormal Libor model (derived in Section 4) to satisfy this same constraint. =

2.3 Swaption hedging For Black–Scholes type options, the option not only describes the first-order sensitivity of the option value to the underlying, but it also represents the probability of exercise of the option and hence can be used for hedging – giving the required hedge ratio into the underlying. The extension of this to the case of swaptions is complicated by the presence of the PVBP discount term in the pricing formula (10), and the fact that the swap rate is not a traded asset. One method2 is to hedge using the underlying forward swap and the PVBP as the hedging instruments. The hedge then consists of two elements • a delta hedge of amount = N (h) (from Section 2.2) into the underlying forward swap Pswap(t), and • a bucket hedge of (B(t) − (ω(t) − κ)) into the PVBP. This produces a portfolio which matches the swaption in value, and – with continual rebalancing – should match the swaption payoff at maturity. Often in practice the swaption is delta-hedged with the underlying swap while the PVBP terms are absorbed into the underlying book as cash flows, where they are hedged as part of the general exposure in different time buckets. 3 Swap rate dynamics in the Libor model The Libor model is deliberately constructed in such a way that the forward Libor rates will be lognormal under certain probability measures – called forward measures – induced by using zero coupon bond prices as the numeraire. Similarly 2 For other methods see Dudenhausen et al. (1998) or Dun et al. (1999).

8. Towards a Central Interest Rate Model

285

the lognormal swap rate model chooses a specific numeraire so that under the measure it induces the forward swap rates will be lognormal. While this numeraire is quite valid within the Libor model framework, analytic tractability can only be obtained if we know the swap rate dynamics under one of the forward measures. Hence the aim of this section is to investigate the possibility of the swap rate being approximately lognormal under a certain forward measure – in this case the one corresponding to the maturity of the swaption PT0 – and to find an expression for its corresponding volatility.

3.1 Swap rate measure in the Libor model

The swap rate measure is the one induced by taking the PVBP = nj=1 δ P(t, T j ) as the numeraire. Under this measure the swap rate ω(t, T0 , n) will be a martingale. T0 (t) Denoting this measure, and the Brownian motion under it, as PT0 and W respectively, we can demonstrate the relationship between PT0 and the Libor model maturity forward measure PT0 as follows. Taking an arbitrary zero coupon bond P (t, Tk ) and applying Itˆo’s lemma to the quotient of it and the PVBP, we obtain $ # $ # FT0 (t, Tk ) P(t, Tk ) = d d δ nj=1 P(t, T j ) δ nj=1 FT0 (t, T j ) # n $ FT0 (t, Tk ) j=1 FT0 (t, T j )σ (t, j) n − σ (t, k) = n δ j=1 FT0 (t, T j ) j=1 FT0 (t, T j ) # $ n j=1 FT0 (t, T j )σ (t, j) n × dWT0 (t) + dt , (17) j=1 FT0 (t, T j ) where we define σ (t, n) as the stochastic function σ (t, n) =

n i=1

δ K (t, Ti−1 ) γ (t, Ti−1 ). 1 + δ K (t, Ti−1 )

The expression (17) is a martingale under PT0 , which implies n FT0 (t, T j )σ (t, j) T0 (t) = dWT0 (t) + j=1 n dW dt, j=1 FT0 (t, T j )

(18)

giving us an explicit relation between Brownian motion under the swap rate measure PT0 and the swaption maturity forward measure PT0 . Further, by applying (2) recursively we arrive at T0 (t) = dW

n

FT0 (t, T j ) dWT j (t) n , j=1 FT0 (t, T j )

j=1

(19)

286

A. Brace, T. Dun and G. Barton

implying not only that PT0 is an equivalent measure to the forward measures PT j , T under this measure is in fact a weighted average of but the Brownian motion W 0 the WT j . Given this relationship, and recalling that the swap rate will be a martingale under PT0 , we feel justified in looking for a lognormal approximation to the swap rate ω(t, T0 , n) under any other of the PT j , and in particular PT0 . Effectively we are choosing to neglect the drift term in (18), an assertion that we will verify by simulation in Section 5.1. Our next step is, assuming an approximate lognormal swap rate distribution under PT0 , to derive an expression for its volatility.

3.2 Approximate swap rate volatility As the swap rate definition (4) is effectively a weighted (by forward prices n FT0 (t, Ti )) average of Libor rates K (t, T j ), it seems evident that FT0 (t, T j )/ i=1 the contribution to the swap rate volatility by the K (t, T j ) will be significantly greater than that of the FT0 (t, T j ). In fact, in this analysis and much of that which follows, we will assume that the contribution in terms of volatility of the FT0 (t, T j ) is negligible and regard them (and hence also the P(t, T j )) as essentially constant at their initial values. This assumption is tested and justified by simulation means in Section 5.2. Examining the individual terms which make up the swap rate (4), we see that they are martingales under the T0 -forward measure PT0 , as demonstrated by Equations (20) and (21) below. d FT0 (t, T j ) FT0 (t, T j ) d FT0 (t, T j ) K (t, T j−1 ) FT0 (t, T j ) K (t, T j−1 )

= −σ (t, j) · dWT0 (t) =

γ (t, T j−1 ) − σ (t, j) · dWT0 (t).

(20) (21)

These terms will become lognormal if the stochastic term σ (t, j) is approximated deterministically. In this case, both the numerator and denominator of (4) will be sums of lognormal processes, and these sums will also be approximately lognormal, as in the standard approximations used to price average rate options. Hence, the swap rate ω (t, T, n), being the ratio of approximate lognormal processes under PT0 , ought to be approximately lognormal itself (with a drift) under the same measure. Following this reasoning, we model the swap rate dynamics under PT0 as dω (t, T, n) = ω (t, T, n) µ(t, T0 , n)dt + γ (t, T0 , n) · dWT0 (t)

(22)

and, neglecting the volatility contribution of the FT0 (t, T j ) as suggested above, we obtain the following approximate expression for the swap rate volatility γ (t, T0 , n)

8. Towards a Central Interest Rate Model

287

in terms of the Libor rate volatilities γ (t, T j ), n γ (t, T0 , n) =

n =

P(0, T j ) K (0, T j−1 ) γ (t, T j−1 ) n j=1 P(0, T j ) K (0, T j−1 )

j=1

(23)

FT0 (0, T j ) K (0, T j−1 ) γ (t, T j−1 ) n . j=1 FT0 (0, T j ) K (0, T j−1 )

j=1

The ability of this equation to predict Libor model swaption volatilities and prices for a given yield curve and Libor volatility function γ (t, T ) will be tested in Section 5.3

4 Greeks in the Libor model Another mechanism for assessing the closeness of swaption pricing within the Libor model to the Black swaption formula is through the calculation of the swaption Greeks. In this section we use approximation techniques to derive equations for the swaption delta, gamma and vega under arbitrary volatility specifications. As seen in Section 2.2, the definition and computation of the swaption delta, gamma and vega are straightforward in the framework implied by the Black swaption formula. Here, the swap rate is a real variable with respect to which we can differentiate, and its corresponding volatility can be expressed likewise – even if the model is multi-factor. For the Libor model, however, the swap rate is not a single quantity but a forward price-weighted sum of Libor rates – all of which can, to a certain extent, behave independently. This means that we do not have a real central variable with respect to which we can differentiate in order to define and compute swaption Greeks. The Libor rates are, however, related together by the swaption covariance matrix (defined in Section 2.1) and this matrix is often of rank one for both single and multi-factor volatility structures. This effectively implies that the Libor rates can, in fact, be described by a single variable. Taking this idea further, it implies – given the assumption of a rank one covariance matrix – the existence of a variable with which we can differentiate and define Greeks in the Libor model. This notion will be central to our approximation calculations below. Note that all the equations derived in this section will be examined numerically in Section 5. 3 Note than an equivalent expression to (23) is independently derived by Rebonato (1999) who also employs

simulation techniques to verify his results.

288

A. Brace, T. Dun and G. Barton

4.1 Approximations Here we give a formal list and explanation of the approximations and assumptions required to derive the equations for the swaption Greeks within the Libor model. Labelling them A1 to A4, we have: A1. The discount terms (FT0 (t, T j ), P(t, T j )) are constant at their initial time zero values; A2. The swaption covariance matrix is of rank one; A3. The volatility function is one-factor separable; and A4. The forward probability measures can be merged into one single measure. Approximation A1 was previously introduced in Section 3.2 where it was observed that the contribution of the volatility of the forward prices (and hence the zero coupon bonds) is essentially negligible. Assumption A2 is required in order to interrelate the Libor rates, and is, in fact, equivalent to A3, which is only included as a separate assumption for reasons of clarity. A3 assumes that we can approximate our (in general multi-factor) volatility function γ (t, T ) by a single-factor separable model, as in γ appr ox (t, T ) = ψ(t) φ(T ).

(24)

While this assumption seems quite restrictive, we note (see Appendix B) that it is entirely equivalent to Assumption A2, in that the volatility structure is separable if and only if the swaption covariance matrix is of rank one. Numerical results suggest that for most (non-extreme) volatility structures, the swaption covariance matrix is very close to rank one, validating both assumptions A2 and A3. This is considered in more detail in Section 5.3. The approximation (24) is constructed in such a way that it returns the rank one swaption covariance matrix T0 (λi, j ) = γ (s, Ti−1 ) · γ (s, T j−1 ) ds t T0 2 = φ(Ti−1 )φ T j−1 ψ (s) ds = × T , t

implying

j = φ T j−1

.

T0

ψ 2 (s)ds.

(25)

t

Approximation A4 is used in simplifying the relationship between the Libor rates and in the computation of the swaption gamma and vega. Essentially it is analogous to the implicit assumption in the Black swaption formula (mentioned in Section 2.2) that the swap rates are assumed lognormal under a single measure Pm .

8. Towards a Central Interest Rate Model

289

We assume that calendar time t = 0 and introduce the abbreviated notation K j ∼ K (0, T j ), P j ∼ P(0, T j ), and φ j ∼ φ(T j ), and the variable U satisfying dU = ψ(t) dW (t), where W (t) is Brownian motion under the single measure into which all the forward measures have been merged. Applying assumptions A1, A3 and A4 to Equations (1) and (4), we have the following simplified equations for the Libor and swap rate processes d K (t, T j−1 ) = K (t, T j−1 ) ψ(t) φ j−1 dWT j (t) = K (t, T j−1 ) φ j−1 dU, and

dω =

j

P j K j−1 φ j−1 dU. j Pj

(26)

(27)

With these assumptions/approximations, we can now proceed to derive equations for the swaption Greeks in the Libor model.

4.2 Libor model delta In the case of single-factor volatility functions, a swaption delta can be derived with minimal approximation by eliminating stochastic terms in the stochastic differential equations for the swap and swaption. Here we consider a different method involving differentiation inside the expectation term, a method which will be further utilised in Section 4.3 to derive an expression for the swaption gamma. Note however that both methods would produce an equivalent expression for the swaption delta. Define i−1 to be the partial derivative of the swaption price with respect to the Libor rate K (0, Ti−1 ). Denoting the swaption price Pswpn(0) as S, we have, using (5), # $ ∂S ∂ δ = P j ET j K (T, T j−1 ) − κ I(A) i−1 = ∂ K i−1 ∂ K i−1 j

∂I (A) ∂ K (T, T j−1 ) = δ P j ET j I (A) + K (T, T j−1 ) − κ . ∂ K i−1 ∂ K i−1 j By measure transformation, the second term inside the expectation can be shown to equate to

∂I (A) =0 P (0, T ) ET Swap(T ) ∂ K i−1

290

A. Brace, T. Dun and G. Barton

since ∂I (A) =0 ∂ K (0, Ti−1 )

if Swap(T ) = 0.

Using the integrated version of Equation (1), we can then show that the remaining expression reduces to i−1 = δ Pi N (h i )

(28)

where the h i are given by (9). Treating U as a real variable, we now obtain an expression for the swaption delta in the Libor model using the definition (13) from Section 2.2, # $ ∂S S 1 ∂ (29) = = ∂ω δ j P j δ j P j ∂ω ∂ S ∂ K j−1 ∂U 1 = δ j P j j ∂ K j−1 ∂U ∂ω j−1 K j−1 φ j−1 i Pi 1 = δ j Pj j i Pi K i−1 φ i−1 j P j N(h j )K j−1 φ j−1 = . (30) j P j K j−1 φ j−1 Equation (30) is tested against the Black swaption in Section 5.6, and in terms of swaption hedging in Section 5.8.

4.3 Libor model gamma Building on the approach of Section 4.2, we can now derive an expression for the swaption gamma in the Libor model. The first step is to calculate second derivatives of the Libor model swaption with respect to the K (·) – which we will denote as i,k – and then, using the assumptions of Section 4.1, obtain a single number that can be compared to the gamma given by the Black formula. We have4 ∂ 2 Pswpn(0) ∂ K i−1 ∂ K k−1

∂ K (T, Ti−1 ) ∂I (Swap(T )) = δ Pi ETi ∂ K i−1 ∂ K k−1

i−1,k−1 =

+ 4 Use the formulae d(x) = I(x), dI(x) = δ {x}, where I (·) is the Heaviside function and δ {·} is the Dirac

delta function.

dx

dx

8. Towards a Central Interest Rate Model

291

∂ K (T, Ti−1 ) ∂ K (T, Tk−1 ) = δ 2 Pi ETi P (T, Tk ) ∂ K i−1 ∂ K k−1 " 66 n ×δ δ P(T, T j ) K (T, T j−1 ) − κ . j=1

With assumption A4, and setting Z ∼ N (0, 1), it follows that,5

2 i−1,k−1 < δ Pi Pk E e(i Z ) e(k Z )δ δ P(T, T j ) K j−1 e j Z − κ j

= δ Pi Pk exp (i k )

P(T, T j ) K j−1 e j [Z + i + k ] − κ . ×E δ δ 2

j

Assuming that the ‘s’ satisfying (7) also approximately satisfies P(T, T j ) K j−1 exp j s − 12 2j − κ = 0,

(31)

j

then we have i−1,k−1 < =

δ Pi Pk exp (i k ) N (s − i − k ) 1 2 j P j K j−1 j exp j s − 2 j

δ Pi Pk N (s − i ) N (s − k ) . j P j K j−1 j N (s − j )

(32)

Using our definition for the swaption gamma (14), we can derive an expression in terms of the partial derivatives derived above, giving # $ ∂2 S 1 ∂ ∂S = = 2 ∂ω δ j P j δ j P j ∂ω ∂ω ∂ ∂ S ∂ K j−1 ∂U 1 . (33) = δ j P j j ∂ K j−1 ∂ω ∂U ∂ω Recall from Section 4.2 that we have ∂ K j−1 ∂U ∂U ∂ω ∂S ∂ω

Pi K j−1 j i Pi K i−1 i ∂ S ∂ K j−1 ∂U = ∂ K j−1 ∂U ∂ω j Pi = i j−1 K j−1 j , i Pi K i−1 i j =

i

5 If X is a random variable under some given measure, then e(X ) = exp X − 1 Var X . 2

292

A. Brace, T. Dun and G. Barton

and substituting these into (33) and taking the partial derivative gives us =

δ

j

Pj

2

i j K i−1 K j−1 i−1, j−1

P j K j−1 j # $# $ j Pj 2 2 + P j K j−1 j j−1 K j−1 j 2 j j δ j P j K j−1 j # $# $ 2 − j−1 K j−1 j P j K j−1 j i

j

j

j

j

in which the second term can be shown to be the difference of two quantities of similar order of magnitude and is hence taken to be zero. Substitution of (32) and collecting terms gives us our final expression for the Libor model swaption gamma =

j

Pj

j

P j K j−1 j N (s − j ) . 2 P K j j−1 j j

(34)

4.4 Libor model vega Finally, we wish to derive an equation for the swaption vega in the Libor model. Combining the approximate swap rate volatility equation (23) with Assumption A3 of an instantaneous one-factor separable volatility (24), we obtain j P j K j−1 φ j−1 γ (t, T0 , n) = ψ(t) . j P j K j−1 The swaption zeta in the Libor model corresponding to (12) is T0 |γ (s, T0 , n)|2 ds ζ = 0

T0

= 0

$2 # j P j K j−1 φ j−1 ψ (s)ds , j P j K j−1 2

and following the methodology presented in Section 2.2 we want to partially differentiate with respect

T0 to2 this variable to obtain the vega. To do this, we will denote by V the integral 0 ψ (s) ds and assume that this constitutes the variable part of ζ , implying $2 # P K φ ∂ζ j j−1 j−1 j . (35) = ∂V j P j K j−1

8. Towards a Central Interest Rate Model

293

From the definition of the vega (15), we have # $ ∂S ∂ Pswpn(0) 1 = = ∂ζ δ j Pj δ j P j ∂ζ =

δ

∂S ∂V , P j ∂ V ∂ζ

1 j

where, in this case, we can obtain the partial derivative ∂ S/∂ V by direct differentiation of the swaption formula (8). Using the additional assumption (implicit in the use of (31)) that d j ≈ 0, gives us ∂h j ∂(h j − j ) ∂S = δ N (h j ) − κ N (h j − j ) P j K j−1 ∂V ∂V ∂V j ∂s ∂ j ∂s + N (−s + j ) + κ N (s) = δ P j K j−1 − ∂V ∂V ∂V j ∂s P j K j−1 exp(s j − 12 2j ) − κ N (s) = δ − ∂V j ∂ j P j K j−1 +δ N (s − j ), ∂V j where the first term can be seen to satisfy (31) and so can be taken as zero. Partial differentiation of (25) yields φ j−1 ∂ j j = √ = ∂V 2V 2 V and hence δ ∂S = P j K j−1 j N (s − j ). ∂V 2V j Substituting from above as necessary, the vega is therefore = = =

δ

1 j

∂S ∂V P j ∂ V ∂ζ #

1

P j K j−1

$2

P j K j−1 j N (s − j ) P P K φ j j j−1 j−1 j j j # $2 1 j P j K j−1 P j K j−1 j N (s − j ). 2 j Pj P K j j−1 j j j 2V

j

(36)

294

A. Brace, T. Dun and G. Barton

Noting from (4) that ω = j P j K j−1 / j P j , we see that the gamma and vega equations (34) and (36) satisfy the constraint (16) imposed on them in Section 2.2, 1 = 2

#

j P j K j−1 j Pj

$2

1 = ω2 . 2

5 Numerical testing and results Ultimately, the closeness of swaption pricing within the Libor model to the Black swaption formula must be tested numerically. In this section, the assumptions fundamental to the analysis are verified, the regime used to test the equations is explained, and the results of the numerical testing presented. In order to test the approximate equations for volatility, pricing and Greeks thoroughly, a range of swaptions, strike values, yield curves and volatility specifications is required. In this light, it was decided to test a matrix consisting of 15 swaptions with maturity values ranging from 0.5 to 4 years, lengths of 1 to 8 years, and at strike values in-, at- and out-of-the-money. The tests were conducted for two separate volatility specifications – the first a single-factor homogeneous parameterisation to actual historic data, chosen to reflect typical market conditions – and the second, an artificial two-factor volatility function chosen to mimic a pathological market situation and stress test the results. Further details on the volatility specifications and their associated yield curves are given in Appendix C. With the Black pricing formula, the price and Greeks can all be computed upon specification of the Black volatility σ . This is not the case in the Libor model, where an equivalent Black volatility can be obtained only by first computing the price and then ‘backing out’ the volatility by solving Equation (10) for a constant valued volatility function σ . Given that any comparison between prices and Greeks would be meaningless if not computed at a Black volatility equivalent to both frameworks, we define the Libor model true price as that value obtained from simulation, and the true volatility as the value obtained by backing out the true price at-the-money. The necessity of this distinction becomes apparent when one notes that Libor swaption pricing formula (8) only gives an approximate price, and one that can deviate from the true value under certain circumstances. The simulated price, however, is a reflection of the exact price, and, exploiting variance reduction means, can be made as accurate as required. This provides us with a number, free of approximation, which can be used objectively for comparison purposes. We start, however, by verifying the assumptions used in deriving the various approximations.

8. Towards a Central Interest Rate Model

295

Fig. 1. Normal probability plot of the log of the swap rates simulated under the Libor model for a 1/8 swap using the second volatility structure.

5.1 Lognormality of the swap rates In Section 3.1 it was postulated that the swap rate ω could be modelled as being approximately lognormal under the PT0 forward measure. This was tested numerically by simulating swap rates under the appropriate measure within the Libor model framework. The simulation was performed by discretising the stochastic differential equations for the Libor rates (1) to produce sets of future yield curves from which the swap rates could be extracted.6 Statistical tests were then applied to the swap rates to determine the nature of the resulting distributions. Figure 1 is an example of one of those statistical tests; a normal probability plot of the log of the simulated swap rates, in this case for an eight year swap, maturing in one year, simulated using the pathological volatility structure. A normal probability plot allows one to determine if random observations come from a normally distributed population; a straight line indicating the affirmative. Slight deviations at either end of the line are common, as a finite number of samples will never be able to fit the infinite tails of the normal distribution exactly. The test can be formalised through the use of quantitative statistical tests (such as the Shapiro–Wilk test), or a goodness-of-fit test between the expected and observed sample frequencies. The latter was used in this case. All the swaptions for both volatility structures gave similar results to those in Figure 1, and at a 95% confidence level, were shown to follow a lognormal probability distribution. 6 See Brace (1998) for details of the simulation routine used, and Glasserman et al. (2000) for detailed analysis

of a range of simulation methods in the forward Libor model.

296

A. Brace, T. Dun and G. Barton

Fig. 2. The ratio between simulated swap rates with and without the effect of the zero coupon bonds.

5.2 Swap rate approximation The approximations in Sections 3–4 rely on the assumption that the contribution of the volatility of the discount terms (forward prices and zero coupon bonds) towards the overall volatility of the swap rate is negligible, and that the discount terms can be considered constant at their initial values. Figure 2 confirms the validity of this assumption on the swap rate for a 1/5 swap, simulated using the second volatility structure. It shows the ratio of the simulated swap rate calculated using all the discount terms, to the value obtained by taking these terms as constant. A value of 1 indicates that the calculation methods are equivalent. This figure demonstrates that the assumption is quite reasonable, leading to errors in the swap rate that are generally below one per cent.

5.3 Rank one covariance matrix The Libor model swaption formula (8) and all the analysis in Section 4 are fundamentally dependent on the assumption that the swaption covariance matrix λ is of rank one. A symmetric matrix is of rank one when it has only one non-zero eigenvalue. A rank one approximation to an arbitrary symmetric matrix will only be accurate if the ratio of the second largest to the largest eigenvalue is small.

8. Towards a Central Interest Rate Model

297

Table 1. Ratio of the first and second eigenvalues for the swaption covariance matrices (both volatility structures). Volatility structure 1

2

Swaption maturity

Swaption length

0.25

1

2

4

1 2 4 8

0.0% 0.0% 0.0% 0.0%

7.5% 1.5% 2.1% 1.6%

1.5% 3.2% 3.5% 2.7%

2.1% 3.5% 5.9%

1 2 4 8

0.5% 4.4% 30.8% 20.5%

1.0% 6.7% 27.9% 13.0%

1.6% 8.2% 17.3% 7.9%

1.6% 4.8% 6.4%

In the case of the Libor model, the rank of the swaption covariance matrix will depend on the form of the volatility function γ (t, T ), and the maturity and length of the individual swaption. A swaption is said to be exhibiting rank two behaviour when the rank one price (8) begins to deviate from the true price. This seems to occur for an eigenvalue ratio of 5% or above, with 20–30% representing extreme values. Table 1 shows this ratio for all the swaptions and volatility structures considered in this paper. A value of 0 represents a swaption covariance matrix of rank one. The second volatility structure was chosen for its pathological nature, and this is reflected in the more extreme values for the eigenvalue ratio seen here. It would not be surprising, therefore, if the approximations of Section 4 were to break down for some of the swaptions under the second volatility structure.

5.4 Swap rate volatility In Section 3.2, we derived the approximate equation (23) for the equivalent Black volatility of a Libor model swaption. In Table 2 we compare values given by this equation to the true volatility, defined in Section 5 as the volatility implied by the at-the-money simulation price of the corresponding swaption within the Libor model framework. The results indicate that the volatility approximation is quite accurate, with all the values for rank one swaptions within about 12 basis points, with this figure rising to 80 basis points for the more extreme rank two swaptions occurring under the second volatility structure. In general, however, the approximate volatility equation (23) provides a good indication for the Libor model true volatility.

298

A. Brace, T. Dun and G. Barton

Table 2. Black volatility verification results for both volatility structures. Volatility structure

Swaption maturity

Swaption length

Volatility description

0.25

1

2

4

1

true approximation

4.64% 4.65%

5.73% 5.74%

10.14% 10.15%

17.59% 17.61%

2

true approximation true approximation true approximation

6.97% 6.98% 14.02% 14.07% 15.32% 15.44%

9.37% 9.38% 15.53% 15.57% 15.80% 15.90%

14.23% 14.24% 17.56% 17.59% 16.57% 16.65%

18.58% 18.58% 18.51% 18.56%

true approximation true approximation

23.16% 23.20% 18.60% 18.72%

19.81% 19.85% 16.64% 16.74%

17.46% 17.50% 16.26% 16.17%

17.76% 17.75% 18.06% 18.04%

true approximation true approximation

15.79% 15.85% 18.37% 17.88%

15.81% 15.68% 19.05% 18.35%

16.67% 16.41% 20.34% 19.54%

20.24% 20.13%

1 4 8 1 2 2 4 8

5.5 Swaption prices Table 3 compares swaption prices for the first volatility structure. Three different prices are given – the true value obtained by simulation, an approximate value obtained by using the Black swaption formula (10) with the swap rate volatility approximation (23), and the Libor model rank one price (8). The prices are expressed in basis points (bp), where 1 bp = $100 per $1M face value. As with the previous swaption volatilities, for the rank one swaptions, the volatility approximation provides a reasonable estimate of the swaption price. As to be expected, the Libor model price performs better in most situations. The deviation between the true and rank one prices is evident in the rank two swaptions under the second volatility structure (shown in Appendix A), and it is not surprising to note that under these circumstances the volatility approximation mirrors the rank one price more than the true price. In general, however, these results show that a Libor model swaption behaves very much like a Black swaption with the volatility given by Equation (23).

8. Towards a Central Interest Rate Model

299

Table 3. Swaption price comparisons for the first volatility structure (all values expressed in basis points). Swaption length

0.25

1

2

4

true vol approx rank 1

12.52 12.53 12.52

30.34 30.35 30.32

68.87 68.96 68.85

126.86 126.93 126.91

true vol approx rank 1 true vol approx rank 1

6.18 6.18 6.18 2.29 2.29 2.29

15.37 15.37 15.35 5.59 5.59 5.58

37.22 37.25 37.20 13.06 13.00 13.05

78.97 79.04 79.02 25.16 25.18 25.17

true vol approx rank 1 true vol approx rank 1 true vol approx rank 1

37.29 37.34 37.29 18.56 18.59 18.56 6.83 6.83 6.83

94.79 95.08 94.77 49.42 49.51 49.40 17.86 17.68 17.85

178.16 178.61 178.07 100.45 100.54 100.36 34.55 34.17 34.54

254.77 254.79 254.72 160.62 160.65 160.56 50.74 50.77 50.69

true vol approx rank 1 true vol approx rank 1 true vol approx rank 1

140.40 140.93 140.38 71.82 72.09 71.81 26.16 26.04 26.18

282.57 283.66 282.38 154.19 154.57 154.02 54.22 53.63 54.24

397.87 398.71 397.51 231.58 231.90 231.16 77.45 77.18 77.44

475.04 475.78 475.20 299.12 299.90 299.26 94.53 94.79 94.35

IN

true vol approx rank 1

272.66 273.80 272.47

507.00 509.09 506.52

666.23 668.80 665.36

AT

true vol approx rank 1

139.66 140.79 139.57

276.30 278.07 276.10

383.50 385.49 382.81

OUT

true vol approx rank 1

50.09 50.68 50.12

95.81 96.33 96.03

128.49 129.04 128.74

Strike IN

1

AT

OUT

IN

2

AT

OUT

IN

4

AT

OUT

8

Swaption maturity

Price description

300

A. Brace, T. Dun and G. Barton

Table 4. Delta comparisons for Libor and Black swaptions for the first volatility structure. Swaption length

1

Swaption maturity Strike

Model

0.25

1

2

4

IN

Black Libor

0.750 0.751

0.750 0.751

0.750 0.752

0.750 0.750

AT

Black Libor Black Libor

0.505 0.506 0.250 0.251

0.511 0.512 0.250 0.250

0.529 0.531 0.250 0.252

0.570 0.570 0.250 0.250

Black Libor Black Libor

0.750 0.752 0.507 0.508

0.750 0.755 0.519 0.523

0.750 0.755 0.540 0.545

0.750 0.750 0.574 0.574

OUT

Black Libor

0.250 0.251

0.250 0.255

0.250 0.255

0.250 0.250

IN

Black Libor Black Libor Black Libor

0.751 0.756 0.514 0.519 0.249 0.255

0.750 0.757 0.531 0.538 0.249 0.257

0.750 0.755 0.549 0.554 0.250 0.254

0.750 0.751 0.573 0.574 0.249 0.250

Black Libor Black Libor

0.752 0.755 0.515 0.518

0.751 0.756 0.531 0.536

0.751 0.754 0.547 0.550

Black Libor

0.248 0.251

0.248 0.253

0.248 0.252

OUT IN 2

4

AT

AT OUT IN

8

AT OUT

5.6 Swaption delta The validity of the approximate swaption delta equation is illustrated in Table 4 which compares values for a range of equivalently priced Black and Libor model swaptions at-, in- and out-of-the-money for the first volatility structure. The Black swaption delta is calculated using the true swap rate volatility (see Section 5.4), with the strike values chosen so that the values in- and out-of-the-money are approximately 0.75 and 0.25, respectively. The results show that the approximate method gives good agreement to the Black swaption – showing slight, yet consistent, over-estimation of the true values.

8. Towards a Central Interest Rate Model

301

Even for the more extreme swaptions under the second volatility structure (see Appendix A), the agreement is quite acceptable, with the values deviating by 4.5% at most, with the average deviation being 0.1%. Note, however, that this deviation, for both volatility structures, tends to increase slightly as the swaptions move outof-the-money.

5.7 Swaption gamma and vega Libor model gamma and vega equations (34) and (36) were tested against their Black counterparts (14) and (15), respectively, with the results shown in Table 5. As in Section 5.6, the Black swaption Greeks are calculated using the true volatility, and the same in- and out-of-the-money strike prices are used. Note that the results will be entirely analogous to the results, as is directly proportional to , as given by (16). We see in general for both and that the agreement between the swaption behaviours is not as good as for the , yet is still quite acceptable, with most of the Libor model results within 5% of the Black values. Note that the Libor model equations tend to underestimate the values in-the-money, while overestimating outof-the-money. Note also that the agreement between the values deteriorates with longer swaption maturity and length. This is also true for the second volatility structure, shown in Appendix A.

5.8 Swaption delta-hedging The Libor model equation (30) gives an approximation to the partial derivatives of the swaption price with respect to the swap rate. However, as explained in Section 2.3, in the Black–Scholes framework (or here, in the framework implied by the Black swaption formula) the is more than just a partial derivative – it represents a probability of exercise of the option – and is fundamental to the concept of hedging. It would be interesting to know if this concept can also be extended to the case of the approximate Libor model delta. To test this, yield curve movements were simulated in the Libor model framework and swaptions hedged using the methodology from Section 2.3 and the approximate formula (30). Rebalancing was effected at a frequency of five times per quarter, and, due to the lack of true (or simulation) prices and volatilities, the hedging was based on values given by the rank one Libor model price formula (8). For comparison purposes, the delta-hedge was run in conjunction with a Libor model hedge encompassing all the relevant Libor rates treated individually – as predicted from the partial derivatives with respect to the Libor rates given by (28).

302

A. Brace, T. Dun and G. Barton

Table 5. Gamma and vega comparisons for Libor model and Black swaptions (for the first volatility structure). Greek type

Swaption length

1

Swaption maturity Strike

Model

0.25

1

2

4

IN

Black Libor

193.5 192.8

73.7 73.4

28.1 27.9

11.3 11.1

AT

Black Libor Black Libor

243.0 242.8 193.5 194.1

92.5 92.5 73.7 73.9

35.3 35.2 28.1 28.4

14.0 13.9 11.3 11.4

Black Libor Black Libor Black Libor

124.3 123.4 156.1 155.8 124.2 124.7

44.1 43.3 55.3 55.1 44.1 44.7

20.1 19.6 25.1 24.9 20.1 20.5

10.7 10.4 13.2 13.1 10.7 10.9

Black Libor Black Libor Black Libor

59.6 58.2 74.9 74.5 59.6 60.4

26.2 25.3 32.8 32.6 26.2 27.0

16.1 15.5 20.1 19.9 16.1 16.7

10.6 10.1 13.1 12.9 10.6 11.0

Black Libor Black Libor Black Libor

52.9 51.3 66.6 65.9 52.9 53.6

25.2 24.0 31.6 31.2 25.2 26.1

16.8 15.7 21.0 20.6 16.8 17.5

OUT IN 2

AT OUT

Gamma IN 4

AT OUT IN

8

AT OUT IN

1

AT OUT

Vega IN 2

AT OUT

Black Libor Black Libor Black Libor

0.484 0.482 0.607 0.607 0.484 0.485

0.208 0.208 0.262 0.262 0.208 0.209

0.087 0.086 0.109 0.109 0.087 0.088

0.036 0.036 0.045 0.044 0.036 0.037

Black Libor Black Libor

0.334 0.332 0.420 0.419

0.130 0.128 0.164 0.163

0.062 0.061 0.078 0.077

0.034 0.033 0.042 0.042

Black Libor

0.334 0.336

0.130 0.132

0.062 0.063

0.034 0.035

8. Towards a Central Interest Rate Model

303

Table 5. (cont.) Greek type

Swaption length

4

Swaption maturity Strike

Model

0.25

1

2

4

IN

Black Libor

0.172 0.168

0.080 0.077

0.051 0.049

0.035 0.033

AT

Black Libor Black Libor

0.216 0.215 0.172 0.174

0.100 0.099 0.080 0.082

0.063 0.063 0.051 0.052

0.043 0.042 0.035 0.036

Black Libor Black Libor Black Libor

0.162 0.156 0.203 0.201 0.161 0.164

0.080 0.076 0.100 0.099 0.080 0.082

0.055 0.051 0.068 0.067 0.055 0.057

OUT Vega IN 8

AT OUT

A more detailed explanation of the mathematics and methodology of the hedging simulation is beyond the scope of this chapter and can be found in Dun et al. (1999). Table 6 presents the results of these hedging tests in the form of means and standard deviations of the hedging profit and loss (P/L) for both volatility structures. A zero mean P/L with a small standard deviation is clearly the preferred outcome in any hedging exercise. The results show that the approximate Libor performs equally as well as individual hedges into the Libor rates – both in terms of P/L mean and standard deviation. All the rank one swaptions have been successfully hedged, with average P/Ls close to zero, while the rank two swaptions show some bias. This bias seems to be approximately equal to the difference between the true and rank one prices, and could probably be reduced by using the true volatility as the basis for the hedges rather than a rank one volatility as mentioned above. In general, however, the results imply that the approximate Libor model is useful for hedging, and that the intuition attached to the delta value in Black swaptions is also valid in the Libor model framework. 6 Conclusions In conclusion, we have derived approximate equations within the lognormal forward Libor model which indicate that swaption pricing in this framework is quite close to market practice. A simple equation can be used to estimate the Black volatility of Libor model swaptions, which can then be priced using the Black

Table 6. Simulated delta hedging means (and standard deviations) for both volatility structures. Values expressed in basis points. Volatility structure

1

Hedging method

1

Approx delta Libor rates Approx delta Libor rates Approx delta Libor rates

0.0 0.0 0.0 0.0 0.0 0.0

(2.3) (2.3) (6.7) (6.7) (26.7) (26.7)

0.0 0.0 0.1 0.1 0.3 0.3

(3.0) 0.0 (3.0) 0.0 (9.6) 0.0 (9.6) 0.0 (28.8) −0.3 (28.8) −0.3

(6.1) 0.1 (6.1) 0.1 (14.5) −0.1 (14.5) −0.1 (30.8) 0.0 (30.8) 0.0

8

Approx delta Libor rates

0.4 0.4

(50.2) −0.6 (50.2) −0.6

(52.6) −0.8 (52.5) −0.8

(51.3) (51.2)

1

Approx delta Libor rates Approx delta Libor rates

0.0 0.0 0.0 0.0

(13.7) (13.7) (23.5) (23.4)

0.0 0.0 0.0 0.0

(14.4) (14.4) (23.9) (23.9)

0.0 0.0 0.2 0.2

(14.4) 0.0 (14.4) 0.0 (21.1) −0.2 (21.0) −0.2

(8.0) (8.0) (14.4) (14.4)

Approx delta 0.1 Libor rates 0.1 Approx delta −9.8 Libor rates −9.9

(36.4) (36.4) (64.7) (64.7)

−1.4 −1.4 −15.9 −15.9

(36.2) (36.1) (65.1) (65.3)

−4.6 −4.6 −14.0 −14.0

(33.5) −0.7 (33.4) −0.7 (60.6) (60.6)

(26.8) (26.8)

2 4

2

Swaption maturity

Swaption length

2 4 8

0.25

1

2

4 (8.4) (8.4) (16.2) (16.2) (28.4) (28.3)

8. Towards a Central Interest Rate Model

305

swaption formula. Equations for swaption Greeks in the Libor model were derived and shown to retain their Black swaption significance, while Libor model swaptions could be successfully hedged with the swaption delta derived. Estimates are accurate while the assumption of a rank one swaption covariance matrix holds, although even when violated, the estimates are still surprisingly close to the true values. Swaption maturity, length and strike value do exhibit a slight influence on the estimates. Overall, the results support the idea that the Libor model could be used for all swaption pricing – as well as caps and exotics pricing – since it can be calibrated to both caps and swaptions markets simultaneously. Conversely, the results could be used to support the idea in Jamshidian (1997) that models which are robust and adapted to the products being priced should be used – even if this means using mutually exclusive models – since we have shown that the Libor and Black (and hence by extension the swap rate) approaches are, numerically, not so different. This study still leaves some questions unanswered, providing scope for further work. This includes, for example, the derivation of analytic bounds for the approximations presented here, an analysis of the closeness of the models when pricing exotics, and an investigation into the impact of using the assumptions of Section 4.1 to simplify exotics pricing. Appendix A. Results for the second volatility structure Comparisons of prices, deltas, gammas and vegas for the second volatility structure not tabulated in the body of the paper appear in Tables 7–9. Appendix B. Rank one and separable volatility If the volatility function is separable, all swaption quadratic variation matrices are of rank one. On the other hand, if a swaption quadratic variation matrix is of rank one, for arbitrary T and Ti = T + iδ, we must have t 2 t t 2 2 γ (s, T ) ds γ (s, Ti ) ds = γ (s, T ) γ (s, Ti ) ds . 0

0

0

The following lemma shows that if this condition is strengthened, separability follows. Lemma 1 Let the LFM volatility function γ (·) be well behaved, and satisfy 2 t t t 2 2 γ (s, u) ds γ (s, v) ds = γ (s, u) γ (s, v) ds 0

0

for all relevant t, u, v. Then γ (·) is separable.

0

(37)

306

A. Brace, T. Dun and G. Barton

Table 7. Swaption price comparisons for the second volatility structure. Swaption length

1

Price Strike

description

0.25

1

2

4

IN

true vol approx rank 1

69.84 69.91 69.83

123.44 123.62 123.44

159.82 160.09 159.87

121.77 121.74 121.76

true vol approx rank 1 true vol approx rank 1

36.95 37.01 36.94 13.07 13.08 13.06

69.31 69.44 69.31 23.62 23.63 23.62

92.79 93.04 92.86 30.89 30.98 30.95

75.99 75.95 75.98 24.24 24.16 24.20

true vol approx rank 1 true vol approx rank 1 true vol approx rank 1

121.42 121.84 121.26 63.03 63.43 62.87 22.48 22.66 22.37

220.52 221.27 220.12 120.88 121.58 120.52 41.67 41.96 41.50

249.97 249.57 250.11 143.94 143.18 144.02 49.02 48.07 48.93

229.35 229.20 229.37 143.69 143.52 143.72 45.80 45.55 45.68

true vol approx rank 1 true vol approx rank 1 true vol approx rank 1

194.98 195.34 194.66 100.24 100.60 99.93 35.99 36.18 35.82

343.86 342.57 342.88 188.39 186.81 187.18 66.04 64.78 65.07

416.41 413.61 413.32 241.57 237.83 237.41 83.07 79.73 79.32

433.46 432.54 433.03 279.52 278.05 278.76 88.45 86.78 87.50

true vol approx rank 1 true vol approx rank 1

337.74 333.90 329.26 178.09 173.28 167.57

599.84 590.10 587.22 340.50 328.00 324.92

728.68 715.81 719.30 441.36 424.19 429.17

true vol approx rank 1

65.70 62.01 57.64

122.64 112.37 110.47

153.97 139.49 144.33

AT

OUT

IN

2

AT

OUT

IN

4

AT

OUT

IN

8

Swaption maturity

AT

OUT

8. Towards a Central Interest Rate Model

307

Table 8. Delta comparisons for Libor model and Black swaptions for the second volatility structure. Swaption

Swaption maturity

length

1

Strike

Model

0.25

1

2

4

IN

Black Libor

0.750 0.751

0.750 0.751

0.750 0.751

0.750 0.750

AT

Black Libor Black Libor

0.523 0.524 0.250 0.250

0.539 0.540 0.249 0.251

0.549 0.550 0.249 0.250

0.570 0.570 0.250 0.250

Black Libor Black Libor Black Libor

0.751 0.753 0.519 0.520 0.248 0.250

0.751 0.753 0.533 0.535 0.248 0.250

0.749 0.750 0.546 0.546 0.252 0.252

0.750 0.750 0.572 0.572 0.250 0.250

Black Libor Black Libor Black Libor

0.751 0.752 0.516 0.516 0.249 0.249

0.749 0.750 0.532 0.531 0.252 0.251

0.748 0.751 0.547 0.547 0.255 0.250

0.750 0.752 0.580 0.582 0.252 0.253

Black Libor Black Libor

0.745 0.759 0.518 0.521

0.744 0.757 0.538 0.543

0.745 0.755 0.557 0.563

Black Libor

0.257 0.245

0.260 0.254

0.262 0.262

OUT IN 2

AT OUT IN

4

AT OUT IN

8

AT OUT

Proof Set . a(t, u) =

t

γ 2 (s, u) ds,

0

a(t, ˙ u) = rewrite (37) as

∂a(t, u) , ∂t

t

γ (s, u) γ (s, v) ds = a(t, u)a (t, v), 0

308

A. Brace, T. Dun and G. Barton

Table 9. Gamma and vega comparisons for Libor model and Black swaptions for the second volatility structure. Greek type

Swaption length

1

Swaption maturity Strike

Model

0.25

1

2

4

IN

Black Libor

32.0 31.8

15.9 15.7

10.6 10.4

10.7 10.6

AT

Black Libor Black Libor

40.1 40.0 32.0 32.1

19.8 19.8 15.8 16.0

13.2 13.1 10.6 10.7

13.2 13.2 10.7 10.8

Black Libor Black Libor

35.7 35.1 44.9 44.6

17.2 16.8 21.6 21.4

13.0 12.8 16.2 16.2

10.9 10.7 13.4 13.4

OUT

Black Libor

35.7 35.8

17.2 17.4

13.0 13.3

10.9 11.1

IN

Black Libor Black Libor

40.7 40.0 51.1 50.9

20.2 20.0 25.2 25.5

14.3 14.2 17.7 18.2

10.4 10.0 12.8 12.7

OUT

Black Libor

40.6 41.0

20.2 20.8

14.4 15.1

10.4 11.0

IN

Black Libor Black Libor

39.4 41.0 48.9 53.4

19.3 19.6 23.9 25.7

13.7 13.5 16.8 17.6

OUT

Black Libor

39.6 43.1

19.5 21.8

13.9 15.6

IN

Black Libor Black Libor Black Libor

0.118 0.117 0.147 0.147 0.118 0.118

0.081 0.080 0.101 0.101 0.081 0.082

0.078 0.077 0.098 0.097 0.078 0.079

0.037 0.037 0.046 0.046 0.037 0.038

IN

Black Libor

0.163 0.160

0.106 0.103

0.074 0.073

0.036 0.035

AT

Black Libor

0.205 0.204

0.132 0.131

0.092 0.092

0.044 0.044

OUT

Black Libor

0.163 0.163

0.105 0.107

0.074 0.076

0.036 0.036

OUT IN 2

AT

Gamma

4

8

1

AT

AT

AT OUT

Vega

2

8. Towards a Central Interest Rate Model

309

Table 9. (cont.) Greek type

Swaption

Swaption maturity

length

Strike

Model

0.25

1

2

4

IN

Black Libor

0.199 0.196

0.101 0.100

0.064 0.064

0.030 0.029

AT

Black Libor Black Libor

0.250 0.249 0.199 0.200

0.126 0.127 0.101 0.104

0.080 0.082 0.065 0.068

0.036 0.036 0.030 0.031

Black Libor Black Libor Black Libor

0.155 0.161 0.192 0.210 0.155 0.169

0.074 0.075 0.091 0.098 0.074 0.083

0.046 0.045 0.056 0.059 0.046 0.052

4

OUT Vega IN 8

AT OUT

differentiate with respect to time t to get γ (t, u) γ (t, v) a(t, ˙ u) a˙ (t, v) + = , a(t, u) a (t, v) a(t, u) a (t, v) and then with respect to v to get < ∂ a˙ (t, v) a(t, u) ∂ γ (t, v) . = ∂v a (t, v) ∂v a (t, v) γ (t, u)

(38)

Since the left hand side of (38) is a function of only t and v, while the right hand side is a function of only t and u, both must be functions of just t. For some function b(t), we must therefore have

t

γ 2 (s, u) ds = b(t)γ 2 (t, u).

0

Differentiation with respect to t, rearrangement, and then integration with respect to t gives ∂γ 2 (t, u) ∂t

< γ 2 (t, u) = ln [γ (t, u)] =

= ˙ 1 − b(t) b(t), = 1 t ˙ 1 − b(s) b(s)ds + c(u), 2 0

310

A. Brace, T. Dun and G. Barton

Fig. 3. Graphical representation of the first volatility structure.

where c (·) is an arbitrary function of u. Setting t = 1 ˙ 1 − b(s) b(s) ds , ψ(t) = exp 2 0 φ(u) = exp (c(u)), gives γ (t, u) = ψ(t)φ(u), which is the result.

Appendix C. Yield curve and volatility structures C.1 Market fit volatility structure The first volatility structure (Figure 3) is a simple one-factor homogeneous parameterisation to market data – the first six months of 1997 UK market data being used here. The yield curve used (Figure 4) is a typical one for that period of time.

C.2 Pathological volatility structure The second volatility structure was chosen intentionally to be pathological, or representative of an extreme market situation. The functions were also optimised in order to ensure that some of the 15 swaptions to be tested had extreme rank two swaption covariance matrices.

8. Towards a Central Interest Rate Model

Fig. 4. Forward Libor rates used in conjunction with the first volatility structure.

Fig. 5. Yield curve associated with the second volatility structure.

The functional form chosen for the yield curve was Yield(T ) =

0.07 + 0.03T /3 for T < 3 0.10 − 0.02(T − 3)/7 otherwise

311

312

A. Brace, T. Dun and G. Barton

Fig. 6. Graphical representation of the two factors of the second volatility structure.

and is shown in Figure 5, while the equations for the volatility were 0.05(T − t) for (T − t) < 6 γ 1 (t, T ) = 0.3 otherwise γ 2 (t, T ) = 0.3 exp (−0.54(T − t)) and these are graphed in Figure 6.

References Brace, A. (1996), Dual swap and swaption formulae in the normal and lognormal models. University of New South Wales Preprint. Brace, A. (1998), Simulation in the GHJM and LFM models. FMMA notes. Brace, A., Gatarek, D. and Musiela, M. (1997), The market model of interest rate dynamics. Math. Finance 7, 127–54. Dudenhausen, A., Schl¨ogl, E. and Schl¨ogl, L. (1998), Robustness of Gaussian hedges under parameter and model misspecification. Working paper, University of Bonn. Dun, T., E., Schl¨ogl and Barton, G. (1999), Simulated swaption delta-hedging in the lognormal forward LIBOR model. Forthcoming in the International Journal of Theoretical and Applied Finance 4(1) 2001. Glasserman, P. and Zhao, X. (2000), Arbitrage-free discretization of lognormal forward LIBOR and swap rate models. Finance Stochast 4(1), 35–68. Hunt, P., Kennedy, J. and Pelsser, A. (1997), Markov functional interest rate models. ABN Amro preprint. Jamshidian, F. (1997), Libor and swap market models and measures. Finance Stochast. 1, 293–330. Miltersen, K.,Sandmann, K. and Sondermann, D. (1997), Closed form solutions for term structure derivatives with lognormal interest rates. J. Finance 52, 407–30.

8. Towards a Central Interest Rate Model

313

Musiela, M. and Rutkowski, M. (1997a) Martingale Methods in Financial Modelling. Springer-Verlag, Berlin. Musiela, M., Rutkowski, M. (1997b) Continuous-time term structure models: a forward measure approach. Finance Stochast. 1, 261–91. Plackett, R.L. (1954), A reduction formula for normal multivariate integrals Biometrika 41, 351–60. Rebonato, R. (1999), On the pricing implications of the joint log-normal assumptions for the swaption and cap markets. Journal of Computational Finance 2(3), 57–76.

9 Infinite Dimensional Diffusions, Kolmogorov Equations and Interest Rate Models B. Goldys and M. Musiela

1 Introduction The common feature of interest rate models is, that taking the Heath, Jarrow and Morton model Heath et al. (1992) as a starting point they naturally lead to infinite dimensional Markov processes which describe the arbitrage free dynamics of forward rates. By a forward rate r (t, x) we mean the continuously compounded forward rate prevailing at time t over the time interval [t + x, t + x + d x]. Usually, the time evolution of forward curves r (t, ·) is completely determined by the initial curve and the volatility structure. The question how to determine the volatility structure is a delicate one and different approaches can be chosen to address this problem; for possible answers see Musiela (1993), Brace and Musiela (1994), Goldys et al. (1995) or Brace et al. (1997). In this chapter, however, we assume that the volatility structure {σ (t, x) : t ≥ 0, x ≥ 0} is a known vector-valued stochastic process. In that case the forward rate process {r (t, x) : t ≥ 0, x ≥ 0} must satisfy the following stochastic partial differential equation 1 ∂ 2 r (t, x) + |σ (t, x)| dt + σ (t, x)dW (t) dr (t, x) = (1.1) ∂x 2 for all t, x ≥ 0, where W is a d-dimensional Brownian motion. It has been shown in Musiela (1993) that (1.1) is sufficient for the nonarbitrage condition. We will concentrate on two models: • Gaussian r (t, x) model for its theoretical and computational simplicity, BGM model. We start with the derivation of the stochastic PDE which is satisfied by the forward rate process {r (t, x) : t, x ≥ 0} We model the uncertainty of future interest rate movements using an infinite family of Wiener processes {Wk : k ≥ 1} defined on the common stochastic basis (, F, (Ft ), P). We assume that (Ft ) is a P-augmentation of the natural filtration σ (Wk (s) : s ≤ t, k ≥ 1). Let 314

9. Kolmogorov Equations and Interest Rate Models

315

{X (t, x} : t, x ≥ 0} be an arbitrary random field. We say that X is adapted to the filtration (Ft ) if σ (X (s, x) : s ≤ t, x ≥ 0) ⊂ Ft for every t ≥ 0. Let P(t, T ) denote the price at time t ≥ 0 of a zero coupon bond with maturity T ≥ t. We assume that T −t P(t, T ) = exp − r (t, u)du (1.2) 0

for a certain measurable random field {r (t, x) : t, x ≥ 0} which is locally bounded: for every T > 0 sup |r (t, x)| < ∞,

P-a.s.

(1.3)

t,x≤T

It follows that the process of saving account t r (u, 0)du , β(t) = exp

t ≥ 0,

0

is well defined. The discounted price of the zero coupon is defined as N (t, T ) =

P(t, T ) , β(t)

t ≤ T.

(1.4)

Theorem 1.1 Let (1.3) hold and let the random field r be adapted to (Ft ). Assume that for every T > 0 the process {N (t, T ) : t ≤ T } is a (P, (Ft ))-martingale and, moreover, R .log N (·, T )/t dT < ∞, R > 0. E (1.5) 0

Then there exists a family {σ k : k ≥ 1} of adapted random fields such that for every T > 0 and k ≥ 1 sup |σ k (t, x)| < ∞,

P-a.s.,

t,x≤T ∞ k=1

and

x 0

+

∞ k=1

t 0

T 0

T

σ 2k (t, x)d xdt < ∞,

P-a.s.,

0

t

r (t, u)du +

x+t

r (s, 0)ds =

0

∞ 1 σ k (s, x + t − s)dWk (s) + 2 k=1

r (0, u)du

0

t 0

σ 2k (s, x + t − s)ds.

316

B. Goldys and M. Musiela

Proof For every T > 0 the process N (·, T ) is continuous and positive. Fix R > 0 and define the process N for all t ≥ 0 and T ∈ [0, R] putting N (t, T ) = N (T, T ) for t ≥ T . Then for every T ≤ R the process {N (t, T ) : t ≤ R} is a continuous square integrable martingale. Therefore, for every T > 0 there exists a continuous local martingale M(·, T ) with M(0, T ) = 0 such that 1 N (t, T ) = P(0, T ) exp −M(t, T ) − .M(·, T )/t , T ≤ R, 2 and M(t, T ) = M(T, T ) for t ≥ T . By (1.5) M(t, ·) is a L 2 (0, R)-valued continuous martingale for every R > 0. It follows from Theorem 8.2 in Da Prato and Zabczyk (1992) that there exists a family {h k : k ≥ 1} of predictable L 2 (0, R)valued processes, such that for t, T ≤ R ∞ t M(t, T ) = h k (s, T )dWk (s) k=1

and E

∞ k=1

R 0

t

0

h 2k (s, T )dT ds < ∞.

0

It is easy to see that the processes h k , k ≥ 1, may be chosen independently of R. Hence, for t, x ≥ 0 we may define σ k (t, x) = h k (t, x + t) and then t+x r (0, u)du N (t, x + t) = exp − −

∞ k=1

0

0

t

∞ 1 σ k (s, x + t − s)dWk (s) − 2 k=1

t

σ 2k (s, x

+ t − s)ds

0

and the theorem follows. In the sequel we assume that for each x ≥ 0 dr (t, x) = g(t, x)dt +

∞

τ k (t, x)dWk (t).

(1.6)

k=1

The random fields {g(t, x) : t, x ≥ 0} and {τ k (t, x) : t, x ≥ 0}, k ≥ 1, satisfy the following conditions. (C1) For every T > 0 sup |g(t, x)| < ∞,

P-a.s.,

t,x≤T

and for every T > 0 and k ≥ 1 sup |τ k (t, x)| < ∞ P-a.s. t,x≤T

9. Kolmogorov Equations and Interest Rate Models

(C2) For every T > 0

∞

T

0

k=1

T

τ 2k (t, x)d xdt < ∞.

317

P-a.s.

0

(C3) For every t > 0 σ (g(s, x) : s ≤ t, x ≥ 0) ∪ σ (τ k (s, x) : s ≤ t, x ≥ 0, k ≥ 1) ⊂ Ft . (C4) σ {r (0, x) : x ≥ 0} ∈ F0 and for every T > 0 sup |r (0, x)| < ∞. x≤T

Theorem 1.2 Assume that for all t, x ≥ 0 2 x ∞ T −t 1 g(t, u)du = r (t, x) − r (t, 0) + τ k (t, u)du . 2 k=1 0 0

(1.7)

Then for all T > 0 the process MT (t) =

P(t, T ) , β(t)

t ∈ [0, T ],

is a P-local martingale and a P-martingale, if in addition the process {r (t, x) : t, x ≥ 0} is bounded on [0, T ] × for all T > 0. Proof We have

T −t

d log P(t, T ) = −d

r (t, u)du

0

T −t

= r (t, T − t)dt −

# g(t, u)du +

0

= r (t, T − t)dt − −

k=1

T −t

$ τ k (t, u)dWk (t) du

k=1

g(t, u)du dt 0

∞

T −t

∞

τ k (t, u)du dWk (t).

0

Hence, the quadratic variation of log P(·, T ) is given by 2 ∞ T −t τ k (t, u)du dt. d .log P(·, T )/ (t) = k=1

0

d P(t, T ) = P(t, T ) r (t, T − t) −

T −t

Therefore,

0

g(t, u)du

318

B. Goldys and M. Musiela

+

∞ 1 2 k=1

T −t

τ k (t, u)du

2 ∞ dt − P(t, T )

0

k=1

T −t

τ k (t, u)dWk (t).

0

The last equation yields ∞ t T −s P(t, T ) = P(0, T ) exp − τ k (s, u)du dWk (s) β(t) 0 k=1 0 2 ∞ t T −s 1 − τ k (s, u)du ds 2 k=1 0 0

(1.8)

which concludes the proof. Remark 1.3 The above theorem has been proved in Musiela (1993) for the finite dimensional Wiener process, that is for a certain d ≥ 1, τ k = 0 for k > d. An extension to the case when the number of driving Wiener processes is infinite has been proposed in Santa-Clara and Sornette (1997). We will reparametrize equation (1.8) putting T = t + x. Since t+x P(0, t + x) = exp − r (0, u)du , 0

we find that (1.8) takes the form t+x P(t, t + x) r (0, u)du = exp − β(t) 0 ∞ t t+x−s · exp − τ k (s, x)d x dWk (s) 1 − 2

0 k=1 0 ∞ t t+x−s k=1

0

2 τ k (s, x)d x

ds .

(1.9)

0

Under the appropriate regularity conditions on the coefficients τ k we obtain formally from (1.9) x+t−s ∞ t r (t, x) = r (0, t + x) + τ k (s, x + t − s) τ k (s, u + t − s)du ds k=1

+

0

0

∞ k=1

t

τ k (s, x + t − s)dWk (s).

(1.10)

0

If we assume that τ k (s, x) = f k (r (u, y) : u ≤ s, y ≥ 0) (x) for k ≥ 1 then (1.10) defines a stochastic integral equation for the random field {r (t, x) : t, x ≥ 0}. Such an approach has been studied in Kennedy (1994) and Hamza and Klebaner (1995).

9. Kolmogorov Equations and Interest Rate Models

319

In this chapter we take another approach, well known in the theory of stochastic partial differential equations. We will transform (1.10) into a a stochastic evolution equation in an appropriate function space. To this end we define first a scale of weighted L 2 -spaces in the following way. First, we assume that for every t ≥ 0 the forward curve r (t, x) is defined for all x ≥ 0. Hence, the state of the forward rate process r (t) at time t is is the curve {r (t, x) : x ≥ 0}. In order to allow bounded, for example constant forward rates, we assume that for a certain α > 0 ∞ r 2 (t, x)e−αx d x < ∞ P − a.s. 0

It follows that a state space for the process {r (t) : t ≥ 0} is the space L 2α (0, ∞) of functions with the finite norm ∞ f 2 (x)e−αx d x. - f -2α = 0

The space

L 2α (0, ∞)

is a Hilbert space with the inner product ∞ . f, g/α = f (x)g(x)e−αx d x. 0

For f ∈ L 2α (0, ∞) we define the semigroup of left shifts S(t) f (·) = f (t + ·),

t ≥ 0.

Then (1.10) may be rewritten as r (t) = S(t)r0 +

+

∞

·

S(t − s)τ k (s)

k=1

0

∞

t

k=1

t

τ k (s, u)du ds

0

S(t − s)τ k (s)dWk (s).

0

We will restrict our considerations to the class of forward rate processes defined by the Markovian dynamics on L 2α (0, ∞), that is we assume that τ k (s) = τ k (s, r (s))(·) ∈ L 2α (0, ∞), where the same notation τ k is preserved. Then · ∞ t r (t) = S(t)r0 + S(t − s)τ k (s, r (s)) τ k (s, r (s))(u)du ds k=1 0 ∞ t

+

k=1

0

0

S(t − s)τ k (s, r (s))dWk (s).

(1.11)

320

B. Goldys and M. Musiela

Let τ : L 2α (0, ∞) → R be defined by the formula x ∞ τ k (t, f )(x) τ k (t, f )(u)du. G(t, f )(x) = 0

k=1

where G : L 2α (0, ∞) → L 2α (0, ∞) and τ (t, f ) =

∞

τ k (t, f (t))ek

k=1

Let {ek : k ≥ 1} be a complete orthonormal system in L 2α (0, ∞). We denote by W (t) =

∞

Wk (t)ek ,

t ≥ 0,

k=1

the standard cylindrical Wiener process on L 2α (0, ∞). By this we mean that W is a process of continuous random functionals on L 2α (0, ∞) with the properties: .W (t), f / ∼ N 0, t - f -2 , t ≥ 0, f ∈ L 2α (0, ∞), E .W (t), f / .W (s), g/ = . f, g/ min(s, t). Then, (1.11) takes the form of the following integral equation in L 2α (0, ∞) t t S(t − s)G(s, r (s))ds + S(t − s)τ (s, r (s))dW (s). (1.12) r (t) = S(t)r0 + 0

0

Definition 1.4 The L 2α (0, ∞)-valued (Ft )-predictable process r is a solution to (1.12) with the initial condition r0 ∈ L 2α (0, ∞) if (a) for all t ≥ 0 t ∞ t -G(s, r (s))- ds + -τ (s, r (s))-22 < ∞, P-a.s., 0

k=1

0

where -τ (s, r (s))-22 =

∞

-τ (s, r (s))-2 .

k=1

(b) for every t ≥ 0 equation (1.12) holds P-a.s. In the theorem below we use the general theory of equations of type (1.12) developed in Da Prato and Zabczyk (1992) to provide conditions for existence and uniqueness of solutions to (1.12).

9. Kolmogorov Equations and Interest Rate Models

321

Theorem 1.5 Assume that piecewise continuous functions τ k : R+ × R → R+ , k ≤ d satisfy the following conditions: for every T > 0 there exists C T > 0 such that sup τ k (t, x) < ∞ x≥0,t≤T

|τ k (t, x) − τ k (t, y)| ≤ C T |x − y|,

t ≤ T.

Then for every α > 0 there exists a unique solution to (1.12) for every r0 ∈ L 2α (0, ∞). Remark 1.6 The above theorem does not assure positivity of forward rates. If we assume that r0 ≥ 0 then under appropriate conditions on τ k one may obtain existence and uniqueness of nonnegative solutions. We do not pursue this topic here. For an example of equation (1.12) with nonnegative solutions see Goldys et al. (1995). It is well known that equation (1.10) is intimately related to a stochastic partial differential equation # $ x ∞ ∂r (t, x) + dr (t, x)(t, x) = τ k (t, r (t, x)) τ k (t, r (t, y))dy dt ∂x 0 k=1 ∞ τ k (t, r (t, x))dWk (t), + k=1 r (0, x) = r (x). 0

(1.13) We will discuss this relationship at the level of the evolution equation (1.12). In the space L 2α (0, ∞) we introduce an operator A = ∂∂x with the domain " 6 2 ∞ ∂ f −αx 1 2 (x) e d x < ∞ , dom(A) = Hα (0, ∞) = f ∈ L α (0, ∞) : ∂x 0 where the derivative is meant in the generalized sense. Equation (1.13) considered in L 2α (0, ∞) takes the form dr (t) = (Ar (t) + G(t, r (t))) dt + τ (t, r (t))dW (t), (1.14) r (0) = r0 . The latter equation, however, does not need to have classical solutions unless further regularity conditions are imposed on the data (see below). In general we define a solution to (1.14) in the mild sense as a solution to (1.12). The relationship between the two equations is clarified by the next theorem, which follows from the general theory developed in Da Prato and Zabczyk (1992).

322

B. Goldys and M. Musiela

Theorem 1.7 Assume that the functions τ k , k ≤ d, satisfy assumptions of theorem 1.5 and let r be a solution to (1.12). Then the following holds. (i) Equation (1.13) holds x-a.e. if and only if τ k (t, ·) ∈ Hα1 for all t ≥ 0 and r0 ∈ Hα1 . (ii) There exist sequences τ nk (t, ·) , r0n ⊂ Hα1 , k ≤ d converging in the L 2α (0, ∞)-norm to τ k (t, ·) and r0 respectively and such that the corresponding solutions of (1.13) satisfy the condition T n r (t) − r (t)2 dt = 0. lim E α n→∞

0

Proof The standard proof of this theorem is omitted.

2 The BGM Model In this section our starting point is the model of Libor rate process proposed in Brace et al. (1997). Let L(t, x) denote the Libor rate process defined by the formula 1 + δL(t, x) =

P(t, t + x) , P(t, t + x + δ)

t, x ≥ 0,

where δ > 0 (for example δ = 0.25) is fixed. We assume that all zero coupons may be expressed in terms of a certain forward rate process r given in (1.2) but we shift our attention to the process log L(t, x) which is supposed to satisfy an equation d (log L(t, x)) = α(t, x)dt + γ (t, x)dW (t),

x ≥ 0,

(2.1)

W is a d-dimensional Wiener process. We need conditions on the drift term α which assure that there is no arbitrage. We assume that the measurable function γ : [0, ∞) × [0, ∞) → Rd is deterministic, ∞ Mγ = sup |γ (t, x)| + sup |γ (t, x + kδ)| < ∞. (2.2) t,x>0

t≥0,x≤δ k=0

Let l be a solution to the following stochastic evolution equation in L 2α (0, ∞): dl(t) = (Al(t) + F(t, l(t)))dt + γ (t)dW (t), (2.3) l(0) = φ ∈ L 2α (0, ∞), where F(t, φ)(x) =

[x/δ] k=0

δ exp (φ(x − kδ)) 1 .γ (t, x − kδ), γ (t, x)/ − |γ (t, x|2 . 1 + δ exp (φ(x − kδ)) 2

9. Kolmogorov Equations and Interest Rate Models

323

If this equation has a solution then we may define the process L via the formula l(t, x) = log L(t, x). In turn (2) allows us to define the family of zero coupons and finally the forward rate process r (t) can be defined provided the appropriate regularity conditions are satisfied. It was shown in Brace et al. (1997) that if l is a solution to (2.3) then the corresponding process of forward rates satisfies the nonarbitrage condition (1.5). Theorem 2.1 Assume (2.1). Then the following holds. (a) For every α > 0 there exists a unique solution to (2.3) in the space L 2α (0, ∞). (b) Let α ≤ 0 and ∞ 2 Nγ = sup e−αx |γ (t, x)|2 d x < ∞. (2.4) t≥0

0

Then there exists a unique solution to (2.3) in L 2α (0, ∞). Proof Note first that |F(t, φ)(x)| ≤ |γ (t, x)|

[x/δ]

|γ (t, x − kδ)| +

k=0

1 |γ (t, x)|2 2

and therefore ∞ e−αx |F(t, φ)(x)|2 d x 0

≤2

∞

e

−αx

|γ (t, x)|

0

≤2

∞

2

#[x/δ]

$2 |γ (t, x − kδ)|

k=0

e−αδn

n=0

1 + Mγ2 2

δ

#

|γ (t, x + nδ)|2

0

∞

n

dx +

|γ (t, x + kδ)|

1 ∞ −αx |γ (t, x)|4 d x e 2 0 $2 dx

k=0

e−αx |γ (t, x)|2 d x.

(2.5)

0

Therefore, for α > 0 -F(t, φ)-2 ≤ 2δ Mγ4

∞ n=0

n 2 e−αδn +

1 4 M < ∞. 2α γ

If α ≤ 0 then (2.3), (2.4) and (2.5) yield -F(t, φ)-2 ≤

3 2 M -γ (t)-2 . 2 γ

324

B. Goldys and M. Musiela

Hence, for every α ∈ R the mapping F : [0, ∞) × L 2α (0, ∞) → L 2α (0, ∞) is uniformly bounded. We will show now that -F(t, φ) − F(t, ψ)- ≤ M F -φ − ψ- , Since

φ, ψ ∈ L 2α (0, ∞).

(2.6)

x y e 1 e 1 + e x − 1 + e y ≤ 2 |x − y|,

we obtain, proceeding similarly as in (2.5), δ ∞ 1 2 −αδn -F(t, φ) − F(t, ψ)- ≤ |γ (t, x + nδ)|2 e 4 n=0 0 $2 # n |γ (t, x + kδ)| |(φ − ψ)(x + kδ)| d x k=0 ∞ 1 2 e−αδn ≤ Mγ 4 n=0

δ

|γ (t, x + nδ)|2

# n

0

$ (φ − ψ)2 (x + kδ) d x. (2.7)

k=0

Hence, if α < 0 then

$ δ # ∞ n 1 -F(t, φ) − F(t, ψ)-2 ≤ Mγ4 e−αδn (φ − ψ)2 (x + kδ) d x 4 0 n=0 k=0 δ ∞ ∞ 1 = Mγ4 (φ − ψ)2 (x + kδ) e−αδn 4 0 k=0 n=k ∞ δ Mγ4 = e−αδk (φ − ψ)2 (x + kδ) 4 1 − e−αδ k=0 0 ∞ (k+1)δ Mγ4 αδ e e−αx (φ − ψ)2 (x)d x ≤ −αδ 4 1−e k=0 kδ =

Mγ4

4 1 − e−αδ

eαδ -φ − ψ-2

and (2.6) follows. Assume now that α ≤ 0. Then by the first inequality in (2.7) δ ∞ 1 -F(t, φ) − F(t, ψ)-2 ≤ |γ (t, x + nδ)|2 e−αδn 4 n=0 0 $2 # n |γ (t, x + kδ)| |(φ − ψ)(x + kδ| d x k=0

1 ≤ Nγ2 4

δ # ∞ 0

k=0

$# |γ (t, x + kδ)|2

∞ k=0

$ e−αδk (φ − ψ)2 (x + kδ) d x

9. Kolmogorov Equations and Interest Rate Models

≤

1 4 N 4 γ

∞ k=0

(k+1)δ

e−αx (φ − ψ)2 (x)d x =

kδ

325

1 4 N -φ − ψ-2 . 4 γ

Finally, Theorem 7.4 in Da Prato and Zabczyk (1992) yields existence of a unique solution to equation (2.3).

3 Kolmogorov equations The classical Black–Scholes formula for a European option price has been derived by solving a partial differential equation identified by means of heuristic arguments (cf. Black and Scholes 1973). Later on a probabilistic interpretation of the above arguments allowed the derivation to be made rigorous Harrison and Pliska (1981). Let us recall briefly the main ideas of this approach. Assume that the price X (t) of a stock is a positive continuous semimartingale such that the logarithm of the stock price has a deterministic quadratic variation .log X /t = σ 2 t. Then some mild technical conditions imply existence of a unique probability measure under which for every t ≥ 0 t t r X (s) ds + σ X (s) dW (s). X (t) = X 0 + 0

0

Moreover, for a given maturity T and a strike price K we can calculate the price of a European put option by taking the conditional expectation of the discounted option payoff, i.e., VT (t, x) = e−r (T −t) E (K − X (T ))+ |X (t) = x for t ≤ T . Since X is a strong Feller process with the infinitesimal generator ∂ ∂2 1 + σ 2x 2 2 ∂x 2 ∂x we can apply the Feynman–Kac formula and identify the function VT with a unique solution of the backward Kolmogorov equation L = rx

∂ 2u 1 ∂u ∂u (t, x) + σ 2 x 2 2 (t, x) + r x (t, x) − r u(t, x) = 0 ∂t 2 ∂x ∂x

(3.1)

with the terminal condition u(T, x) = (K − x)+ . In this section we investigate whether this strategy can be applied to interest rate options in general term structure models. Consider a European swaption, an option with maturity T on a swap with the cashflows C i , i = 1, . . . , n at times Ti , i = 1, 2, . . . , n such that T < T1

0 sup E-r (t, ζ )- p ≤ C T, p 1 + E-ζ - p . t≤T

If τ (t, ·) is Fr´echet differentiable on L 2α (0, ∞) then for every t ≥ 0 the mapping φ → r (t, φ) is Fr´echet differentiable P-a.s. In general the solution to (3.5) is not a

9. Kolmogorov Equations and Interest Rate Models

327

semimartingale but for every ψ ∈ dom (A∗ ) = φ ∈ H 1 : φ(0) = 0 t t .F(s, r (s)), ψ/ ds r (s), A∗ ψ ds + .r (t), ψ/ = .φ, ψ/ + 0

t

+

0

G ∗ (s, r (s))ψ, dW (s)

(3.3)

0

and hence .r (t), ψ/ is a semimartingale and so is the multidimensional process r (t), ψ 1 , . . . , r (t), ψ n for any n and arbitrary collection of ψ 1 , . . . , ψ n ∈ dom (A∗ ). It follows that the process r is an L 2 ([0, T ] × , λ ⊗ P)-limit of semimartingales for every T > 0. This property will be used later on in the discussion of the Kolmogorov equation. The following property of the process t R(t, φ) = r (s, φ) ds 0

will be useful. Lemma 3.1 For every T > 0 there exists cT > 0 such that sup E-(R(t, φ) − R(t, ψ)-1 ≤ cT -φ − ψ-. t≤T

Proof The standard proof of this lemma is omitted. Let us go back now to the problem of pricing interest rate dependent options. To begin with, note that in the present terminology the price of zero coupon can be rewritten as follows. Let BT (t, φ) = e.φ,S(t)I[0,T ] / , with I[0,T ] denoting the indicator function of the interval [0, T ]. It follows that P(t, T ) = BT (t, r (t)). Any measurable mapping F : L 2α (0, ∞) → R such that T sup E |F(r (T ))| exp − r (u, 0) du 0 and a>0 t δ(r (s, φ)) ds < ∞. sup E exp 2 p -r (t, φ)- − 2 -φ-≤a

0

If r (t, φ) ∈ H 1 for every t ≥ 0 and φ ∈ H 1 then we will need a H 1 - version of (A): (A ) We assume α = 0. Moreover, there exists p ≥ 0 such that for every t > 0 and a>0 t δ(r (s, φ)) ds < ∞. sup E exp 2 p -r (t, φ)-1 − 2 -φ-≤a

0

9. Kolmogorov Equations and Interest Rate Models

329

We will show that (A ) holds if r is a Gaussian process. If the process r is nonnegative then the results presented are valid and the assumption (A) is not below

t needed. In general the term exp − 0 δ(r (s, φ)ds can grow exponentially. Proposition 3.2 If (A) holds for a certain p ≥ 0 then putting H = L 2 (0, ∞), Ptδ C p (H ) ⊂ C (H ) and Ptδ C p H 1 ⊂ C H 1 for every t ≥ 0. Proof We provide the proof for H 1 only. Let F ∈ C p H 1 and let φ n ⊂ H 1 be a sequence converging in H 1 to φ. Then F(φ) = e− p-φ-1 G(φ) with G ∈ C0 H 1 and t δ P F(φ) ≤ -G-0 E exp p -r (t, φ)-1 − δ(r (u, φ) )du . t 0

Ptδ F(φ)

Hence in view of (A ) is well defined. Moreover, (A ) yields uniformly integrability of the family of random variables

t δ(r (u, φ) )du : -φ- ≤ a exp p -r (t, φ)-1 − 0

for every a > 0. Hence the proposition follows from the continuity of F and Lemma (3.3). Remark 3.3 The above theorem may be proved for any α ∈ R. However, the Kolmogorov equation we are going to study next is simpler in L 2 (0, ∞). We shall identify the infinitesimal generator L of the Markov process r . Because the process r is not a semimartingale we can not apply the Itˆo formula to the function F(r (t, φ)) even if F ∈ C 2p (Hα ). However, it turns out that the property (3.3) is sufficient for our needs. Let ψ 1 , . . . , ψ n ∈ dom (A∗ ) and let Pn denote the orthogonal projection on the linear span Hn of the vectors ψ 1 , . . . , ψ n . First, let us define the space D0 = F ∈ C p (Hα ) : F = f ◦ Pn , f ∈ C 2p Rn , n = 1, . . . . If F ∈ D0 then in view of (3.3) the process F(r (t, φ)) is a semimartingale and t t L F(r (s, φ)) ds + D F(r (s, φ))τ (r (s, φ))dW (s), F(r (t, φ)) = F(φ) + 0

0

(3.6)

where L F(φ) =

1 2 D F(φ)τ (φ), τ (φ) + φ, A∗ D F(φ) + .G(φ), D F(φ)/. 2

If F ∈ D0 then the function A∗ D F(φ) is well-defined for all φ ∈ L 2 (0, ∞) and therefore L F(φ) is a well-defined continuous function on L 2 (0, ∞). The above

330

B. Goldys and M. Musiela

considerations show that the generator of the Markov process r coincides on D0 with the operator L. Therefore we can expect that VT as defined in (3.5) is a Feynman–Kac formula for the solution of the following equation ∂u (t, φ) + Lu(t, φ) − δ(φ)u(t, φ) = 0, ∂t (3.7) u(T, φ) = F(φ). In other words the operator L δ = L − δ when considered on an appropriate domain is a generator of the semigroup Ptδ . However, equation (3.7) is not valid in general because VT (t, ·) need not be differentiable. Proposition 3.4 Assume that τ and G are twice differentiable on H . Then for every F ∈ C 2p (H ) the function VT is a unique solution of the backward Kolmogorov equation (3.7) in the following sense. • The function VT : [0, ∞)× H → R is bounded and continuous with respect to each variable. • For every t ≥ 0 we have VT (t, ·) ∈ C 2 (H ). • We have VT ∈ C 1 ([0, T ], H 1 ). • Equation (3.7) holds for every φ ∈ dom (A) and t ≥ 0. Moreover, VT is given by (3.5). Proof Let δ n denote a sequence of C 2 functions on R such that .δ n , φ/ → δ(φ) for every continuous φ and let L n = L − δ n . If we denote by Ptn the semigroup t n Pt F(φ) = E exp − .δ n , r (u, φ)/ du F(r (t, φ)) 0

then by a simple modification of the proof of Theorem 9.17 in Da Prato and Zabczyk (1992) we can show, putting u n (t, φ) = Ptn F(φ), that n ∂u (t, φ) + Lu n (t, φ) − .δ n , φ/ u n (t, φ) = 0, ∂t (3.8) n u (T, φ) = F(φ), and moreover u n is a unique solution of (3.8). We shall show first that for every φ∈H lim Ptn F(φ) = Ptδ F(φ).

n→∞

Indeed, |Ptn F(φ)

−

t p-r (t,φ)- ≤ -F- p E e exp − .δ n , r (u, φ)/ du 0 t δ(r (u, φ)) du − exp −

Ptδ F(φ)|

0

(3.9)

9. Kolmogorov Equations and Interest Rate Models

331

and therefore (A) and the definition of δ n yield (3.9). Using (3.9) and Theorem 9.16 in Da Prato and Zabczyk (1992) we obtain easily that the right-hand side of (3.8) converges (along the subsequence n k ) to the expression L Ptδ F(φ) − δ(φ)Ptδ F(φ) for every φ ∈ Hα1 uniformly in t ≤ T . Hence ∂u n k ∂ Ptδ (t, φ) = (φ) k→∞ ∂t ∂t lim

and therefore Ptδ F satisfies (3.7). Unfortunately, this theorem has too strong assumptions to be applicable to some important contingent claims like swaptions. Stronger results can be obtained in the Gaussian case. Proposition 3.5 The mapping u is a solution of (3.7) if and only if u(t, φ) = BT (t, φ)RT (t, φ), RT (T, φ) = F(φ) and 1 ∂ RT (t, φ) + D 2 R T (t, φ)τ (φ), τ (φ) + .D RT (t, φ), Aφ + G(φ)/ ∂t 2 − .D RT (t, φ), τ (φ)/ τ (φ), S(t)I[0,T ] = 0, (3.10) where the solution is defined in the sense of Proposition 3.4. Proof Let u satisfy (3.7) and define the function RT by the formula u(t, φ) = BT (t, φ)RT (t, φ). Then RT is smooth and ∂u ∂ RT (t, φ) = φ(T − t)BT (t, φ)RT (t, φ) + BT (t, φ) (t, φ), ∂t ∂t Du(t, φ) = −BT (t, φ)RT (t, φ)S(t)I[0,T ] + BT (t, φ)D RT (t, φ), D 2 u(t, φ) = BT (t, φ)RT (t, φ) S(t)I[0,T ] ⊗ S(t)I[0,T ] −2BT (t, φ)D RT (t, φ) ⊗ S(t)I[0,T ] + BT (t, φ)D 2 RT (t, φ).

(3.11) (3.12) (3.13)

Hence by (3.12) .Du(t, φ), Aφ + G(φ)/ = −BT (t, φ)R T (t, φ) 2 1 φ(T − t) − φ(0) + S(t)I[0,T ] , τ (φ) 2

(3.14)

and by (3.13) 2 2 D u(t, φ)τ (φ), τ (φ) = BT (t, φ)R T (t, φ) S(t)I[0,T ] , τ (φ) − 2BT (t, φ) .D RT (t, φ), τ (φ)/ S(t)I[0,T ] , τ (φ) (3.15) + BT (t, φ) D 2 RT (t, φ)τ (φ), τ (φ) .

332

B. Goldys and M. Musiela

Finally, taking into account (3.11), (3.14) and (3.15) we find that 1 ∂u (t, φ) + D 2 u(t, φ)τ (φ), τ (φ) + .Du(t, φ), Aφ + G(φ)/ − δ(φ)u(t, φ) ∂t 2 ∂ RT 1 = BT (t, φ) (t, φ) + D 2 R T (t, φ)τ (φ), τ (φ) + .D RT (t, φ), Aφ + G(φ)/ ∂t 2 − .D R T (t, φ), τ (φ), τ (φ)/ S(t)I[0,T ] , τ (φ) and (3.10) follows. Using similar arguments we show that if RT satisfies (3.10) then u(t, φ) = BT (t, φ)RT (t, φ) is a solution to (3.7). Remark 3.6 The proposition 3.5 describes the forward measure transformation performed at the level of the Kolmogorov equation. Note that equation (3.10) is the Kolmogorov equation for the process Y (say) defined as a solution to the stochastic differential equation dY = (AY + G σ (Y ) − .τ (Y ), S(t)I T / τ (Y )) dt + τ (Y )dW or in a more explicit form x ∂Y (t, x) + τ (Y (t))(x) dY (t, x) = τ (Y (t))(u) du dt ∂x 0 T −t τ (Y (t))(u) dudt + τ (Y (t))(x)dW (t). − τ (Y (t))(x) 0

From this point on we assume that τ ∈ H is a constant vector and therefore t t r (t) = S(t)φ + S(s)G ds + S(t − s)τ dW (s). 0

0

This case has been discussed in Musiela (1993) and Brace and Musiela (1994). For every t ≥ 0 the random variable r (t) is Gaussian with the mean t Er (t) = S(t)φ + S(s)G ds 0

and the covariance operator Qt =

t

S(s)τ τ ∗ S ∗ (s) ds.

0

Moreover, because r (t, φ) is Gaussian so is R(t, φ)(0). Hence, using the H¨older inequality we check by direct calculations that for t ≤ T E exp 2 p -r (t, φ)-α − 2R(t, φ)(0) ≤ C T exp β T -φ-

9. Kolmogorov Equations and Interest Rate Models

333

for some constants C T , β T > 0. Therefore (A) holds. In the present framework equation (3.7) may be written in the form ∂u (t, φ) = 12 D 2 u(t, φ)τ , τ + .Aφ + G(φ), Du(t, φ)/ − δ(φ)u(t, φ), ∂t u(0, φ) = F(φ), φ ∈ dom (A). (3.16) We shall need the finite dimensional parabolic PDE n 1 ∂ 2h ∂h bi∗ (t)b j (t)xi x j (t, x1 , . . . , x n ) = 0 (t, x1 , . . . , x n ) + ∂t 2 i, j=1 ∂ xi ∂ x j

(3.17)

with the terminal condition h (T, x1 , . . . , x n ) = h 0 (x 1 , . . . , x n ) and Ti −t T j −t ∗ ∗ bi (t)b j (t) = τ (x) d x τ (x) d x. T −t

T −t

Equation (3.17) has a unique solution for every measurable terminal condition h 0 with linear growth. Let FT,Ti (t, φ) = exp − S(t)IT,Ti , φ , where IT,Ti is an indicator function of the interval [T, Ti ]. Theorem 3.7 If the function U (t, x 1 , . . . , xn ) is a solution to (3.17) with the terminal condition U0 (x1 , . . . , x n ) then the function u(t, φ) = BT (t, φ)U t, FT,T1 (t, φ), . . . , FT,Tn (t, φ) is a solution to the Cauchy problem (3.6) with the terminal condition u(T, φ) = U0 BT1 (T, φ), . . . , BTn (T, φ) . Proof It is enough to consider the case n = d = 1. The general argument is exactly the same. In view of Proposition 3.5 we need to show that the function (3.18) R(t, φ) = U t, FT,T1 (t, φ), . . . , FT,Tn (t, φ) is a solution to equation (3.10). Note first that d FT,T1 (t, φ) = (φ (T1 − t) − φ (T − t)) FT,T1 (t, φ), dt D FT,T1 (t, φ) = −FT,T1 (t, φ)lt with lt = I[T −t,T1 −t] and D 2 FT,T1 (t, φ) = FT,T1 (t, φ)lt ⊗ lt .

334

B. Goldys and M. Musiela

Hence, denoting l = I[0,T −t] we find that for φ ∈ dom (A) ∂U ∂R (t, φ) = t, FT,T1 (t, φ) ∂t ∂t + FT,T1 (t, φ)(φ(T1 − t) − φ(T − t))

∂U t, FT,T1 (t, φ) (3.19) ∂x

and D R(t, φ) = −FT,T1 (t, φ)

∂U t, FT,T1 (t, φ) lt . ∂x

Hence .D R(t, φ), τ / = −FT,T1 (t, φ)

∂U t, FT,T1 (t, φ) .lt , τ / ∂x

(3.20)

and .D R(t, φ), Aφ + G σ / = −FT,T1 (t, φ)

∂U t, FT,T1 (t, φ) .lt , Aφ + G σ / ∂x

∂U t, FT,T1 (t, φ) (φ (T1 − t) − φ (T − t)) ∂x 2 x T1 −t 1 d ∂U τ (u) du d x t, FT,T1 (t, φ) − FT,T1 (t, φ) ∂x T −t 2 d x 0 ∂U = −FT,T1 (t, φ) t, FT,T1 (t, φ) (φ (T1 − t) − φ (T − t)) ∂x # 2 T −t 2 $ T1 −t ∂U 1 . τ (u) du − τ (u) du − FT,T1 (t, φ) t, FT,T1 (t, φ) 2 ∂x 0 0 = −FT,T1 (t, φ)

Thereby ∂U t, FT,T1 (t, φ) (φ (T1 − t) − φ (T − t)) ∂x 1 ∂U t, FT,T1 (t, φ) .τ , l/2 − FT,T1 (t, φ) 2 ∂x ∂U t, FT,T1 (t, φ) .τ , l/ .τ , lt / . −FT,T1 (t, φ) (3.21) ∂x

.D R(t, φ), Aφ + G/ = −FT,T1 (t, φ)

Next ∂U t, FT,T1 (t, φ) lt ⊗ lt ∂x ∂ 2U 2 + FT,T (t, φ) (t, FT (t, φ)) lt ⊗ lt . 1 ∂x2

D 2 R(t, φ) = FT,T1 (t, φ)

9. Kolmogorov Equations and Interest Rate Models

335

Hence ∂U D 2 R(t, φ)τ , τ = FT,T1 (t, φ) t, FT,T1 (t, φ) .lt , τ /2 ∂x 2 U ∂ 2 +FT,T (t, φ) 2 t, FT,T1 (t, φ) .lt , τ /2 . 1 ∂x Now, taking into account (3.19), (3.20), (3.21) and (3.22) we find that

(3.22)

∂R 1 (t, φ) + D 2 RT (t, φ)τ (φ), τ (φ) + .D RT (t, φ), Aφ + G σ (φ)/ ∂t 2 − .D RT (t, φ), τ (φ)/ .τ (φ), S(t)IT / 1 2 ∂ 2U ∂U t, FT,T1 (t, φ) .lt , τ /2 , (t, φ) t, FT,T1 (t, φ) + FT,T 1 2 ∂t 2 ∂x where R(t, φ) is defined by (3.18). Therefore, by (3.17) the function R satisfies equation (3.10) and the theorem follows. =

References Black, F. and Scholes, M. (1973), The pricing of options and corporate liabilities, J. Political Economy 81 637–59 Brace, A., Ga¸ tarek, D. and Musiela, M. (1997), The market model of interest rate dynamics, Math. Finance 7 127–54 Brace, A. and Musiela, M. (1994), A multifactor Gauss–Markov implementation of Heath, Jarrow and Morton, Mat. Finance 2 259–83 Da Prato, G. and Zabczyk, J. (1992), Stochastic equations in infinite dimensions, Cambridge University Press Goldys, B., Musiela, M. and Sondermann, D. (1995), Lognormality of rates and term structure models, preprint, UNSW ´ ¸ ch, A. (1997), Optimal stopping in Hilbert spaces and pricing of Ga¸ tarek, D. and Swie American options, a preprint Hamza, K. and Klebaner, F.C. (1995), A stochastic partial differential equation for term structure of interest rates, a preprint Harrison, J.M. and Pliska, S.R. (1981), Martingales and stochastic integrals in the theory of continuous trading, Stochastic Process. Appl. 11 215–60 Heath, D. Jarrow, R. and Morton, A. (1992), Bond pricing and the term structure of interest rates: a new methodology, Econometrica 61(1) 77–105 Kennedy, P.D. (1994), The term structure of interest rates as a Gaussian Markov field, Math. Finance 4 247–58 Musiela, M. (1993), Stochastic PDEs and term structure models, Journ´ees Internationales de Finance, IGR-AFFI, La Baule Santa-Clara, P. and Sornette, D. (1997), The dynamics of the forward interest rate curve with stochastic string shocks, preprint, UCLA

10 Modelling of Forward Libor and Swap Rates Marek Rutkowski

1 Introduction The last decade was marked by a rapidly growing interest in the arbitrage-free modelling of bond market. Undoubtedly, one of the major achievements in this area was a new approach to the term structure modelling proposed by Heath, Jarrow and Morton in their work published in 1992, commonly known as the HJM methodology. One of its main features is that it covers a large variety of previously proposed models and provides a unified approach to the modelling of instantaneous interest rates and to the valuation of interest-rate sensitive derivatives. Let us give a very concise description of the HJM approach (for a detailed account we refer, for instance, to Chapter 13 in Musiela and Rutkowski (1997a)). The HJM methodology is based on an exogenous specification of the dynamics of instantaneous, continuously compounded forward rates f (t, T ). For any fixed maturity T ≤ T ∗ , the dynamics of the forward rate f (t, T ) are d f (t, T ) = α(t, T ) dt + σ (t, T ) · dWt , where α and σ are adapted stochastic processes with values in R and Rd , respectively, and W is a d-dimensional standard Brownian motion with respect to the underlying probability measure P which plays the role of the real-world probability. More formally, for every fixed T ≤ T ∗ , where T ∗ > 0 is the horizon date, we have t t α(u, T ) du + σ (u, T ) · dWu f (t, T ) = f (0, T ) + 0

0

for some Borel-measurable function f (0, ·) : [0, T ∗ ] → R and stochastic processes applications α(·, T ) and σ (·, T ). Let us notice that, for any fixed maturity date T ≤ T ∗ , the initial condition f (0, T ) is determined by the current value of the continuously compounded forward rate for the future date T which prevails at time 0. In practical terms, the function f (0, T ) is determined by the current yield curve, 336

10. Modelling of Forward Libor and Swap Rates

337

which can be estimated on the basis of observed market prices of bonds (and other relevant instruments). Let us denote by B(t, T ) the price at time t ≤ T of a unit zero-coupon bond which matures at the date T ≤ T ∗ . In the present setup the price B(t, T ) can be recovered from the formula T B(t, T ) = exp − f (t, u) du . t

The problem of the absence of arbitrage opportunities in the bond market can be formulated in terms of the existence of a suitably defined martingale measure. It appears that in an arbitrage-free setting – that is, under the martingale measure – the drift coefficient α in the dynamics of the instantaneous forward rate is uniquely determined by the volatility coefficient σ , and a stochastic process which can be interpreted as the market price of the interest-rate risk. If we denote by P∗ the martingale measure for the bond market, and by W ∗ the associated standard Brownian motion, then d B(t, T ) = B(t, T ) rt dt + b(t, T ) · dWt∗ , where rt = f (t, t) is the short-term interest rate, and the bond price volatility b(t, T ) satisfies T σ (t, u) du. (1.1) b(t, T ) = − t

Furthermore, it appears that in the special case when the coefficient σ follows a deterministic function, the valuation formulae for interest rate-sensitive derivatives are independent of the choice of the risk premium. In this sense, the choice of a particular model from the broad class of HJM models hinges uniquely on the specification of the volatility coefficient σ . The HJM methodology appeared to be very successful both from the theoretical and practical viewpoints. Since the HJM approach to the term structure modelling is based on an arbitrage-free dynamics of the instantaneous continuously compounded forward rates, it requires a certain degree of smoothness with respect to the tenor of the bond prices and their volatilities. For this reason, working with such models is not always convenient. An alternative construction of an arbitrage-free family of bond prices, making no reference to the instantaneous rates, is in some circumstances more suitable. The first step in this direction was done by Sandmann and Sondermann (1993), who focused on the effective annual interest rate. This approach was further developed in ground-breaking papers by Miltersen et al. (1997) and Brace et al. (1997), who proposed to model instead the family of forward Libor rates. The main goal was to produce an arbitrage-free term structure model which would support the common

338

M. Rutkowski

practice of pricing such interest-rate derivatives as caps and swaptions through a suitable version of Black’s formula. This practical requirement enforces the lognormality of the forward Libor (or swap) rate under the corresponding forward martingale measure. It is interesting to notice that Brace et al. (1997) parametrize their version of the lognormal forward Libor model introduced by Miltersen et al. (1997) with a piecewise constant volatility function. They need to consider smooth volatility functions in order to analyse the model in the HJM framework, however. The backward induction approach to the modelling of forward Libor and swap rate developed in Musiela and Rutkowski (1997a) and Jamshidian (1997) overcomes this technical difficulty. In addition, in contrast to the previous papers, it allows also for the modelling of forward Libor (and swap) rates associated with accrual periods of differing lengths. It should be stressed that a similar (but not identical) approach to the modelling of market rate was developed in a series of papers by Hunt et al. (1996, 2000) and Hunt and Kennedy (1996, 1997). Since special emphasis is put here on the existence of the underlying low-dimensional Markov process that governs directly the dynamics of interest rates, this alternative approach is termed the Markov-functional approach. This property leads to a considerable simplification in numerical procedures associated with the model’s implementation. Another important feature of this approach is its ability of providing a perfect fit to market prices of a given family of interest-rate options.

2 Modelling of forward Libor rates In this section, we present various approaches to the modelling of forward Libor rates. We focus here on the model’s construction, its basic properties, and the valuation of the most typical derivatives. For further details, the interested reader is referred to the original papers: Musiela and Sondermann (1993), Sandmann and Sondermann (1993), Goldys et al. (1994), Sandmann et al. (1995), Brace et al. (1997), Jamshidian (1997), Miltersen et al. (1997), Musiela and Rutkowski (1997b), Rady (1997), Sandmann and Sondermann (1997), Rutkowski (1998, 1999), Glasserman and Kou (1999), and Yasuoka (1999). The issues related to the model’s implementation are extensively treated in Brace (1996), Andersen and Andreasen (1997), Sidenius (1997), Brace et al. (1998), Musiela and Sawa (1998), Hull and White (1999), Schl¨ogl (1999), Uratani and Utsunomiya (1999), Yasuoka (1999), Lotz and Schl¨ogl (2000), Glasserman and Zhao (2000), Brace and Womersley (2000), and Dun et al. (2000).

10. Modelling of Forward Libor and Swap Rates

339

2.1 Forward and futures Libor rates Our first task is to examine those properties of forward and futures contracts related to the notion of the Libor rate which are universal; that is, which do not rely on specific assumptions imposed on a particular model of the term structure of interest rates. To this end, we fix an index j, and we consider various interest-rate sensitive derivatives related to the period [T j , T j+1 ]. To be more specific, we shall focus in this section on single-period forward swaps – that is, forward rate agreements. We need to introduce some notation. We assume that we are given a prespecified collection of reset/settlement dates 0 < T0 < T1 < · · · < Tn = T ∗ , referred to as the tenor structure. Also, we denote δ j = T j − T j−1 for j = 1, . . . , n. We write B(t, T j ) to denote the price at time t of a T j -maturity zero-coupon bond. P∗ is the spot martingale measure, while for any j = 0, . . . , n we write PT j to denote the forward martingale measure associated with the date T j . The corresponding d-dimensional Brownian motions are denoted by W ∗ and W T j , respectively. Also, we write FB (t, T, U ) = B(t, T )/B(t, U ) so that FB (t, T j+1 , T j ) =

B(t, T j+1 ) , B(t, T j )

∀ t ∈ [0, T j ],

is the forward price at time t of the T j+1 -maturity zero-coupon bond for the settlement date T j . We use the symbol π t (X ) to denote the value (i.e., the arbitrage price) at time t of a European contingent claim X . Finally, we shall use the letter E for the Dol´eans exponential, for instance, · t 1 t ∗ ∗ 2 γ u · dWu = exp γ u · dWu − |γ u | du , Et 2 0 0 0 where the dot ‘ · ’ and | · | stand for the inner product and Euclidean norm in Rd , respectively. 2.1.1 Single-period swaps settled in arrears Let us first consider a single-period swap agreement settled in arrears; i.e., with the reset date T j and the settlement date T j+1 (multi-period interest rate swaps are examined in Section 3). By the contractual features, the long party pays δ j+1 κ and receives B −1 (T j , T j+1 ) − 1 at time T j+1 . Equivalently, he pays an amount Y1 = 1 + δ j+1 κ and receives Y2 = B −1 (T j , T j+1 ) at this date. The values at time t ≤ T j of these payoffs are π t (Y1 ) = B(t, T j+1 ) 1 + δ j+1 κ , π t (Y2 ) = B(t, T j ). The second equality above is trivial, since the payoff Y2 is equivalent to the unit payoff at time T j . Consequently, for any fixed t ≤ T j , the value of the forward

340

M. Rutkowski

swap rate, which makes the contract worthless at time t, can be found by solving for κ = κ(t, T j , T j+1 ) the following equation: π t (Y2 ) − π t (Y1 ) = B(t, T j ) − B(t, T j+1 ) 1 + δ j+1 κ = 0. It is thus apparent that κ(t, T j , T j+1 ) =

B(t, T j ) − B(t, T j+1 ) , δ j+1 B(t, T j+1 )

∀ t ∈ [0, T j ].

Note that the forward swap rate κ(t, T j , T j+1 ) coincides with the forward Libor rate L(t, T j ) which, by the market convention, is set to satisfy 1 + δ j+1 L(t, T j ) =

B(t, T j ) = E P T j+1 (B −1 (T j , T j+1 ) | Ft ) B(t, T j+1 )

(2.1)

for every t ∈ [0, T j ]. Let us notice that the last equality is a consequence of the definition of the forward measure PT j+1 . We conclude that in order to determine the forward Libor rate L(·, T j ), it is enough to find the forward price FX (t, T j+1 ) at time t of the contingent claim X = B −1 (T j , T j+1 ) in the forward contact that settles at time T j+1 . Indeed, it is well known (see, for instance, Musiela and Rutkowski (1997a)) that FX (t, T j+1 ) = B(t, T j+1 ) E PT j+1 (B −1 (T j , T j+1 ) | Ft ). Furthermore, it is evident that the process L(·, T j ) follows necessarily a martingale under the forward probability measure PT j+1 . Recall that in the Heath–Jarrow– Morton framework, we have, under PT j+1 , T (2.2) d FB (t, T j , T j+1 ) = FB (t, T j , T j+1 ) b(t, T j ) − b(t, T j+1 ) · dWt j+1 , where, for each maturity date T , the process b(·, T ) represents the price volatility of the T -maturity zero-coupon bond. On the other hand, if the process L(·, T j ) is strictly positive, it can be shown to admit the following representation1 T j+1

d L(t, T j ) = L(t, T j )λ(t, T j ) · dWt

,

where λ(·, T j ) is an adapted stochastic process which satisfies mild integrability conditions. Combining the last two formulae with (2.1), we arrive at the following fundamental relationship, which plays an essential role in the construction of the lognormal model of forward Libor rates, δ j+1 L(t, T j ) λ(t, T j ) = b(t, T j ) − b(t, T j+1 ), 1 + δ j+1 L(t, T j )

∀ t ∈ [0, T j ].

(2.3)

1 This representation is a consequence of the martingale representation property of the standard Brownian

motion.

10. Modelling of Forward Libor and Swap Rates

341

For instance, in the construction which is based on the backward induction, relationship (2.3) will allow us to determine the forward measure for the date T j , provided that PT j+1 , W T j+1 and the volatility λ(t, T j ) of the forward Libor rate L(·, T j−1 ) are known. (One may assume, for instance, that λ(·, T j ) is a prespecified deterministic function.) Recall that in the Heath–Jarrow–Morton framework2 the Radon–Nikod´ym density of PT j with respect to PT j+1 is known to satisfy · dPT j T j+1 b(t, T j ) − b(t, T j+1 ) · dWt = ET j . (2.4) dPT j+1 0 In view of (2.3), we thus have · dPT j δ j+1 L(t, T j ) T j+1 λ(t, T j ) · dWt = ET j . dPT j+1 0 1 + δ j+1 L(t, T j ) For our further purposes, it is also useful to observe that this density admits the following representation dPT j = cFB (T j , T j , T j+1 ) = c 1 + δ j+1 L(T j , T j ) , dPT j+1

PT j+1 -a.s.,

(2.5)

where c > 0 is the normalizing constant, and thus dPT j dPT j+1

= cFB (t, T j , T j+1 ) = c 1 + δ j+1 L(t, T j ) ,

PT j+1 -a.s.

|Ft

Finally, the dynamics of the process L(·, T j ) under the probability measure PT j are given by a somewhat involved stochastic differential equation δ j+1 L(t, T j )|λ(t, T j )|2 Tj dt + λ(t, T j ) · dWt . d L(t, T j ) = L(t, T j ) 1 + δ j+1 L(t, T j ) As we shall see in what follows, it is nevertheless not hard to determine the probability law of L(·, T j ) under the forward measure PT j – at least in the case of the deterministic volatility λ(·, T j ) of the forward Libor rate. 2.1.2 Single-period swaps settled in advance Consider now a similar swap which is, however, settled in advance – that is, at time T j . Our first goal is to determine the forward swap rate implied by such a contract. Note that under the present assumptions, the long party (formally) pays an amount Y1 = 1 + δ j+1 κ and receives Y2 = B −1 (T j , T j+1 ) at the settlement date T j (which coincides here with the reset date). The values at time t ≤ T j of these payoffs admit the following representations π t (Y1 ) = B(t, T j ) 1 + δ j+1 κ , π t (Y2 ) = B(t, T j )E PT j (B −1 (T j , T j+1 ) | Ft ). 2 See Heath et al. (1992) or Chapter 13 in Musiela and Rutkowski (1997a).

342

M. Rutkowski

The value κ = κ(t, ˆ T j , T j+1 ) of the modified forward swap rate, which makes the swap agreement settled in advance worthless at time t, can be found from the equality π t (Y2 ) − π t (Y1 ) = B(t, T j ) E PT j (B −1 (T j , T j+1 ) | Ft ) − (1 + δ j+1 κ) = 0. It is clear that

−1 κ(t, ˆ T j , T j+1 ) = δ −1 j+1 E P T j (B (T j , T j+1 ) | Ft ) − 1 .

˜ T j ) by We are in a position to introduce the modified forward Libor rate L(t, setting, for every t ∈ [0, T j ], −1 ˜ T j ) := δ −1 L(t, j+1 E P T j (B (T j , T j+1 ) | Ft ) − 1 . Let us make two remarks. First, it is clear that finding of the modified forward ˜ T j ) is formally equivalent to finding the forward price of the claim Libor rate L(·, −1 B (T j , T j+1 ) for the settlement date T j .3 Second, it is useful to observe that ˜L(t, T j ) = E PT 1 − B(T j , T j+1 ) Ft = E PT (L(T j , T j ) | Ft ). (2.6) j j δ j+1 B(T j , T j+1 ) In particular, it is evident that at the reset date T j the two kinds of forward Libor rates introduced above coincide, since manifestly ˜ j , T j ) = 1 − B(T j , T j+1 ) = L(T j , T j ). L(T δ j+1 B(T j , T j+1 ) To summarize, the “standard” forward Libor rate L(·, T j ) satisfies L(t, T j ) = E PT j+1 (L(T j , T j ) | Ft ),

∀ t ∈ [0, T j ],

with the initial condition L(0, T j ) =

B(0, T j ) − B(0, T j+1 ) . δ j+1 B(0, T j+1 )

˜ T j ) we have On the other hand, for the modified Libor rate L(·, ˜ j , T j ) | Ft ), ˜ T j ) = E PT ( L(T L(t, j

∀ t ∈ [0, T j ],

with the initial condition

−1 ˜ L(0, T j ) = δ −1 j+1 E P T j (B (T j , T j+1 )) − 1 .

The calculation of the right-hand side above involve not only on the initial term structure, but also the volatilities of bond prices (for more details, we refer to Rutkowski (1998)). 3 Recall that in the case of a forward Libor rate, the settlement date was T j+1 .

10. Modelling of Forward Libor and Swap Rates

343

2.1.3 Eurodollar futures contracts The next object of our studies is the futures Libor rate. A Eurodollar futures contract is a futures contract in which the Libor rate plays the role of an underlying asset. By convention, at the contract’s maturity date T j , the quoted Eurodollar futures price, denoted by E(T j , T j ), is set to satisfy E(T j , T j ) := 1 − δ j+1 L(T j , T j ). Equivalently, in terms of the zero-coupon bond price we have E(T j , T j ) = 2 − B −1 (T j , T j+1 ). From the general theory, it follows that the Eurodollar futures price at time t ≤ T j equals E(t, T j ) := E P∗ (E(T j , T j )) = 2 − E P∗ B −1 (T j , T j+1 ) | Ft (2.7) (recall that P∗ represents the spot martingale measure in a given model of the term structure). It is thus natural to introduce the concept of the futures Libor rate, associated with the Eurodollar futures contract, through the following definition. Definition 2.1 Let E(t, T j ) be the Eurodollar futures price at time t for the settlement date T j . The implied futures Libor rate L f (t, T j ) satisfies E(t, T j ) = 1 − δ j+1 L f (t, T j ),

∀ t ∈ [0, T j ].

(2.8)

It follows immediately from (2.7)–(2.8) that the following equality is valid: 1 + δ j+1 L f (t, T j ) = E P∗ B −1 (T j , T j+1 ) | Ft . (2.9) Equivalently, we have ˜ j , T j ) | Ft ). L f (t, T j ) = E P∗ (L(T j , T j ) | Ft ) = E P∗ ( L(T Note that in any term structure model, the futures Libor rate necessarily follows a martingale under the spot martingale measure P∗ (provided, of course, that P∗ is well-defined in this model). 2.2 Lognormal models of forward Libor rates We shall now describe alternative approaches to the modelling of forward Libor rates in a continuous- and discrete-tenor setups. 2.2.1 The Miltersen–Sandmann–Sondermann approach The first attempt to provide a rigorous construction of a lognormal model of forward Libor rates was done by Miltersen et al. (1997). The interested reader is referred also to Musiela and Sondermann (1993), Goldys et al. (1994), and Sandmann et al. (1995) for related previous studies. As a starting point in their

344

M. Rutkowski

approach, Miltersen et al. (1997) postulate that the forward Libor rates process L(·, T ) satisfies d L(t, T ) = µ(t, T ) dt + L(t, T )λ(t, T ) · dWt∗ , with a deterministic volatility function λ(·, T ) : [0, T ] → Rd . It is not difficult to deduce from the last formula that the forward price of a zero-coupon bond satisfies d F(t, T + δ, T ) = −F(t, T + δ, T ) 1 − F(t, T + δ, T ) λ(t, T ) · dWtT . Subsequently, they focus on the partial differential equation satisfied by the function v = v(t, x), which expresses the forward price of the bond option in terms of the forward bond price. It is interesting to note that the PDE (2.10) was previously solved by Rady and Sandmann (1994) who worked within a different framework, however.4 The PDE for the option’s price is ∂v 1 ∂ 2v + |λ(t, T )|2 x 2 (1 − x)2 2 = 0 ∂t 2 ∂x

(2.10)

with the terminal condition v(T, x) = (K − x)+ . As a result, Miltersen et al. (1997) obtained not only the closed-form solution for the price of a bond option (this was already achieved in Rady and Sandmann (1994)), but also the “market formula” for the caplet’s price. The rigorous approach to the problem of existence of such a model was presented by Brace et al. (1997), who also worked within the continuous-time Heath–Jarrow–Morton framework. 2.2.2 Brace–Ga¸ tarek–Musiela approach To formally introduce the notion of a forward Libor rate, we assume that we are given a family B(t, T ) of bond prices, and thus also the collection FB (t, T, U ) of forward processes. In contrast to the previous section, we shall now assume that a strictly positive real number δ < T ∗ , which represents the length of the accrual period, is fixed throughout. By definition, the forward δ-Libor rate L(t, T ) for the future date T ≤ T ∗ − δ prevailing at time t is given by the conventional market formula 1 + δL(t, T ) = FB (t, T, T + δ),

∀ t ∈ [0, T ].

(2.11)

The forward Libor rate L(t, T ) represents the add-on rate prevailing at time t over the future time interval [T, T + δ]. We can also re-express L(t, T ) directly in terms of bond prices, as for any T ∈ [0, T ∗ − δ], we have 1 + δL(t, T ) =

B(t, T ) , B(t, T + δ)

∀ t ∈ [0, T ].

(2.12)

4 In fact, they were concerned with the valuation of options on zero-coupon bonds for the term structure model

put forward by B¨uhler and K¨asler (1989).

10. Modelling of Forward Libor and Swap Rates

In particular, the initial term structure of forward Libor rates satisfies B(0, T ) L(0, T ) = δ −1 −1 . B(0, T + δ)

345

(2.13)

Given a family FB (t, T, T ∗ ) of forward processes, it is not hard to derive the dynamics of the associated family of forward Libor rates. For instance, one finds that under the forward measure PT +δ , we have d L(t, T ) = δ −1 FB (t, T, T + δ) γ (t, T, T + δ) · dWtT +δ , where PT +δ is the forward measure for the date T + δ, and the associated Wiener process W T +δ equals t T +δ ∗ Wt = Wt − b(u, T + δ) du, ∀ t ∈ [0, T + δ]. 0

Put another way, the process L(·, T ) solves the equation d L(t, T ) = δ −1 (1 + δL(t, T )) γ (t, T, T + δ) · dWtT +δ ,

(2.14)

subject to the initial condition (2.13). Suppose that forward Libor rates L(t, T ) are strictly positive. Then formula (2.14) can be rewritten as follows: d L(t, T ) = L(t, T ) λ(t, T ) · dWtT +δ ,

(2.15)

where for any t ∈ [0, T ] λ(t, T ) =

1 + δL(t, T ) γ (t, T, T + δ). δL(t, T )

(2.16)

This shows that the collection of forward processes uniquely specifies the family of forward Libor rates. The construction of a model of forward Libor rates relies on the following assumptions. (LR.1) For any maturity T ≤ T ∗ − δ, we are given a Rd -valued, bounded deterministic function5 λ(·, T ), which represents the volatility of the forward Libor rate process L(·, T ). (LR.2) We assume a strictly decreasing and strictly positive initial term structure B(0, T ), T ∈ [0, T ∗ ]. The associated initial term structure L(0, T ) of forward Libor rates satisfies, for every T ∈ [0, T ∗ −δ], L(0, T ) =

B(0, T ) − B(0, T + δ) . δ B(0, T + δ)

(2.17)

5 Volatility λ could well follow an adapted stochastic process; we deliberately focus here on a lognormal model

of forward Libor rates in which λ is deterministic.

346

M. Rutkowski

To construct a model satisfying (LR.1)–(LR.2), Brace et al. (1997) place themselves in the Heath–Jarrow–Morton setup and they assume that for every T ∈ [0, T ∗ ], the volatility b(t, T ) vanishes for every t ∈ [(T − δ) ∨ 0, T ]. In essence, the construction elaborated in Brace et al. (1997) is based on the forward induction, as opposed to the backward induction which we shall use in the next section. They start by postulating that the dynamics of L(t, T ) under the spot martingale measure P∗ are governed by the following SDE: d L(t, T ) = µ(t, T ) dt + L(t, T )λ(t, T ) · dWt∗ , where λ is a deterministic function, and the drift coefficient µ is unspecified. Recall that the arbitrage-free dynamics of the instantaneous forward rate f (t, T ) are d f (t, T ) = σ (t, T ) · σ ∗ (t, T ) dt + σ (t, T ) · dWt∗ , where σ ∗ (t, T ) = (cf. (2.12))

T t

σ (t, u) du = −b(t, T ). On the other hand, the relationship

T +δ

1 + δL(t, T ) = exp

f (t, u) du

(2.18)

T

is valid. Applying Itˆo’s formula to both sides of (2.18), and comparing the diffusion terms, we find that T +δ δL(t, T ) ∗ ∗ σ (t, T + δ) − σ (t, T ) = σ (t, u) du = λ(t, T ). 1 + δL(t, T ) T To solve the last equation for σ ∗ in terms of L, it is necessary to impose some sort of initial condition on σ ∗ . For instance, by setting σ (t, T ) = 0 for 0 ≤ t ≤ T ≤ t + δ, we obtain the following relationship: ∗

b(t, T ) = −σ (t, T ) = −

[δ −1 (T −t)] k=1

δL(t, T − kδ) λ(t, T − kδ). 1 + δL(t, T − kδ)

(2.19)

The existence and uniqueness of solutions to SDEs which govern the instantaneous forward rate f (t, T ) and the forward Libor rate L(t, T ) for σ ∗ given by (2.19) can be shown using forward induction. Taking this result for granted, we conclude that L(t, T ) satisfies, under the spot martingale measure P∗ , d L(t, T ) = L(t, T )σ ∗ (t, T ∗ + δ) · λ(t, T ) dt + L(t, T )λ(t, T ) · dWt∗ . In this way, Brace et al. (1997) are able to completely specify their model of forward Libor rates.

10. Modelling of Forward Libor and Swap Rates

347

2.2.3 Musiela–Rutkowski approach In this section, we describe an alternative approach to the modelling of forward Libor rates; the construction presented below is a slight modification of that given by Musiela and Rutkowski (1997b). Let us start by introducing some notation. We assume that we are given a prespecified collection of reset/settlement dates 0 < T0 < T1 < · · · < Tn = T ∗ , referred to as the tenor structure (by convention, T−1 = 0). Let us denote δ j = T j − T j−1 for j = 0, . . . , n. Then obviously T j = j i=0 δ i for every j = 0, . . . , n. We find it convenient to denote, for m = 0, . . . , n, Tm∗ = T ∗ −

n

δ j = Tn−m .

j=n−m+1

For any j = 0, . . . , n − 1, we define the forward Libor rate L(·, T j ) by setting L(t, T j ) =

B(t, T j ) − B(t, T j+1 ) , δ j+1 B(t, T j+1 )

∀ t ∈ [0, T j ].

Definition 2.2 For any j = 0, . . . , n, a probability measure PT j on (, FT j ), equivalent to P, is said to be the forward Libor measure for the date T j if, for every k = 0, . . . , n the relative bond price Un− j+1 (t, Tk ) :=

B(t, Tk ) , δ j B(t, T j )

∀ t ∈ [0, Tk ∧ T j ],

follows a local martingale under PT j . It is clear that the notion of forward Libor measure is in fact identical with that of a forward probability measure for a given date. Also, it is trivial to observe that the forward Libor rate L(·, T j ) necessarily follows a local martingale under the forward Libor measure for the date T j+1 . If, in addition, it is a strictly positive process, the existence of the associated volatility process can be justified by standard arguments. In our further development, we shall go the other way around; that is, we will assume that for any date T j , the volatility λ(·, T j ) of the forward Libor rate L(·, T j ) is exogenously given. In principle, it can be a deterministic Rd -valued function of time, a Rd -valued function of the underlying forward Libor rates, or it can follow a d-dimensional adapted stochastic process. For simplicity, we assume throughout that the volatilities of forward Libor rates are bounded processes (or functions). To be more specific, we make the following standing assumptions. Assumptions (LR) We are given a family of bounded adapted processes λ(·, T j ), j = 0, . . . , n − 1, which represent the volatilities of forward Libor rates L(·, T j ). In addition, we are given an initial term structure of interest rates, specified by a

348

M. Rutkowski

family B(0, T j ), j = 0, . . . , n, of bond prices. We assume here that B(0, T j ) > B(0, T j+1 ) for j = 0, . . . , n − 1. Our aim is to construct a family L(·, T j ), j = 0, . . . , n − 1 of forward Libor rates, a collection of mutually equivalent probability measures PT j , j = 1, . . . , n, and a family W T j , j = 1, . . . , n of processes in such a way that: (i) for any j = 1, . . . , n the process W T j follows a d-dimensional standard Brownian motion under the probability measure PT j , (ii) for any j = 0, . . . , n − 1, the forward Libor rate L(·, T j ) satisfies the SDE T j+1

d L(t, T j ) = L(t, T j ) λ(t, T j ) · dWt

,

∀ t ∈ [0, T j ],

(2.20)

with the initial condition L(0, T j ) =

B(0, T j ) − B(0, T j+1 ) . δ j+1 B(0, T j+1 )

As already mentioned, the construction of the model is based on backward induction, therefore we start by defining the forward Libor rate with the longest maturity, i.e., Tn−1 . We postulate that L(·, Tn−1 ) = L(·, T1∗ ) is governed under the underlying probability measure P by the following SDE6 d L(t, T1∗ ) = L(t, T1∗ ) λ(t, T1∗ ) · dWt , with the initial condition L(0, T1∗ ) =

B(0, T1∗ ) − B(0, T ∗ ) . δ n B(0, T ∗ )

Put another way, we have L(t, T1∗ )

B(0, T1∗ ) − B(0, T ∗ ) = Et δ n B(0, T ∗ )

0

·

λ(u, T1∗ ) · dWu .

Since B(0, T1∗ ) > B(0, T ∗ ), it is clear that the L(·, T1∗ ) follows a strictly positive martingale under PT ∗ = P. The next step is to define the forward Libor rate for the date T2∗ . For this purpose, we need to introduce first the forward probability measure for the date T1∗ . By definition, it is a probability measure Q, which is equivalent to P, and such that processes U2 (t, Tk∗ ) =

B(t, Tk∗ ) δ n−1 B(t, T1∗ )

6 Notice that, for simplicity, we have chosen the underlying probability measure P to play the role of the forward Libor measure for the date T ∗ . This choice is not essential, however.

10. Modelling of Forward Libor and Swap Rates

349

are Q-local martingales. It is important to observe that the process U2 (·, Tk∗ ) admits the following representation: U2 (t, Tk∗ ) =

δ n−1 δ n U1 (t, Tk∗ ) . δ n L(t, T1∗ ) + 1

Let us formulate an auxiliary result, which is a straightforward consequence of Itˆo’s rule. Lemma 2.3 Let G and H be real-valued adapted processes, such that dG t = α t · dWt ,

d Ht = β t · dWt .

Assume, in addition, that Ht > −1 for every t and denote Yt = (1 + Ht )−1 . Then d(Yt G t ) = Yt α t − Yt G t β t · dWt − Yt β t dt . It follows immediately from Lemma 2.3 that δ n L(t, T1∗ ) ∗ k ∗ λ(t, T1 ) dt dU2 (t, Tk ) = ηt · dWt − 1 + δ n L(t, T1∗ ) for a certain process ηk . Therefore it is enough to find a probability measure under which the process t t δ n L(u, T1∗ ) T∗ ∗ λ(u, T Wt 1 := Wt − ) du = W − γ (u, T1∗ ) du, t 1 ∗ 1 + δ L(u, T ) n 0 0 1 t ∈ [0, T1∗ ], follows a standard Brownian motion (the definition of γ (·, T1∗ ) is clear from the context). This can be easily achieved using Girsanov’s theorem, as we may put · dPT1∗ ∗ = ET1 γ (u, T1∗ ) · dWu , P-a.s. dP 0 We are in a position to specify the dynamics of the forward Libor rate for the date T2∗ under PT1∗ , i.e. we postulate that T∗

d L(t, T2∗ ) = L(t, T2∗ ) λ(t, T2∗ ) · dWt 1 , with the initial condition L(0, T2∗ ) =

B(0, T2∗ ) − B(0, T1∗ ) . δ n−1 B(0, T1∗ )

Let us now assume that we have found processes L(·, T1∗ ), . . . , L(·, Tm∗ ). This ∗ and the associated means, in particular, that the forward Libor measure PTm−1

350

M. Rutkowski ∗

Brownian motion W Tm−1 are already specified. Our aim is to determine the forward Libor measure PTm∗ . It is easy to check that Um+1 (t, Tk∗ ) =

δ n−m−1 δ n−m Um (t, Tk∗ ) . δ n−m L(t, Tm∗ ) + 1

Using Lemma 2.3, we obtain the following relationship: t ∗ δ n−m L(u, Tm∗ ) Tm−1 Tm∗ − Wt = W t λ(u, Tm∗ ) du ∗ 0 1 + δ n−m L(u, Tm ) for t ∈ [0, Tm∗ ]. The forward Libor measure PTm∗ can thus be easily found using ∗ Girsanov’s theorem. Finally, we define the process L(·, Tm+1 ) as the solution to the SDE T∗

∗ ∗ ∗ ) = L(t, Tm+1 ) λ(t, Tm+1 ) · dWt m , d L(t, Tm+1

with the initial condition ∗ L(0, Tm+1 )=

∗ ) − B(0, Tm∗ ) B(0, Tm+1 . δ n−m B(0, Tm∗ )

Remarks If the volatility coefficient λ(·, Tm ) : [0, Tn ] → Rd is a deterministic function, then for each date t ∈ [0, Tm ] the random variable L(t, Tm ) has a lognormal probability law under the forward probability measure PTm+1 . Let us now examine the existence and uniqueness of the implied savings account,7 in a discrete-time setup. Intuitively, the value Bt∗ of a savings account at time t can be interpreted as the cash amount accumulated up to time t by rolling over a series of zero-coupon bonds with the shortest maturities available. To find the process B ∗ in a discrete-tenor framework, we do not have to specify explicitly all bond prices; the knowledge of forward bond prices is sufficient. Indeed, it is clear that FB (t, T j , T ∗ ) B(t, T j ) FB (t, T j , T j+1 ) = = . FB (t, T j+1 , T ∗ ) B(t, T j+1 ) This in turn yields, upon setting t = T j FB (T j , T j , T j+1 ) = 1/B(T j , T j+1 ),

(2.21)

so that the price B(T j , T j+1 ) of a single-period bond is uniquely specified for every j. Though the bond that matures at time T j does not physically exist after this date, it seems justifiable to consider FB (T j , T j , T j+1 ) as its forward value at time T j for the next future date T j+1 . In other words, the spot value at time T j+1 of one cash 7 The interested reader is referred to Musiela and Rutkowski (1997b) for the definition of an implied savings

account in a continuous-time setup. See also D¨oberlein and Schweizer (1998) and D¨oberlein et al. (2000) for further developments and the general uniqueness result.

10. Modelling of Forward Libor and Swap Rates

351

unit received at time T j equals B −1 (T j , T j+1 ). The discrete-time savings account B ∗ thus equals (recall that T−1 = 0) BT∗k =

k 0

k 0 −1 FB T j−1 , T j−1 , T j = B T j−1 , T j

j=0

j=0

for k = 0, . . . , n, since, by convention, we set B0∗ = 1. Note that FB T j−1 , T j−1 , T j = 1 + δL(T j−1 , T j ) > 1 for j = 0, . . . , n, and since BT∗ j = FB (T j−1 , T j−1 , T j ) BT∗ j−1 , we find that BT∗ j > BT∗ j−1 for every j = 0, . . . , n. We conclude that the implied savings account B ∗ follows a strictly increasing discrete-time process. Let us define the probability measure P∗ equivalent to P on (, FT ∗ ) by the formula8 dP∗ = BT∗ ∗ B(0, T ∗ ), P-a.s. (2.22) dP The probability measure P∗ appears to be a plausible candidate for a spot martingale measure. Indeed, if we set (2.23) B(Tl , Tk ) = E P∗ BT∗l (BT∗k )−1 FTl for every l ≤ k ≤ n, then in the case of l = k − 1, equality (2.23) coincides with (2.21). Let us observe that it is not possible to uniquely determine the continuous-time dynamics of a bond price B(t, T j ) within the framework of the discrete-tenor model of forward Libor rates (the specification of forward Libor rates for all maturities is necessary for this purpose). 2.2.4 Jamshidian’s approach The backward induction approach to modelling of forward Libor rates presented in the preceding section was re-examined and essentially generalized by Jamshidian (1997). In this section, we present briefly his approach to the modelling of forward Libor rates. As made apparent in the preceding section, in the direct modelling of Libor rates, no explicit reference is made to the bond price processes, which are used to formally define a forward Libor rate through equality (2.12). Nevertheless, to explain the idea that underpins Jamshidian’s approach, we shall temporarily assume that we are given a family of bond prices B(t, T j ) for the future dates T j , j = 1, . . . , n. By definition, the spot Libor measure is that probability measure equivalent to P, under which all relative bond prices are local martingales, when the 8 Recall that P plays the role of the forward Libor measure for the date T ∗ . Therefore, formula (2.22) is a

consequence of the standard definition of a forward measure.

352

M. Rutkowski

price process obtained by rolling over single-period bonds is taken as a numeraire. The existence of such a measure can be either postulated or derived from other conditions.9 Let us put, for t ∈ [0, T ∗ ] (as before T−1 = 0) G t = B(t, Tm(t) )

m(t) 0

B −1 (T j−1 , T j ),

(2.24)

j=0

where m(t) = inf k = 0, 1, . . . |

k

δi ≥ t = inf {k = 0, 1, . . . | Tk ≥ t}.

i=0

It is easily seen that G t represents the wealth at time t of a portfolio which starts at time 0 with one unit of cash invested in a zero-coupon bond of maturity T0 , and whose wealth is then reinvested at each date T j , j = 0, . . . , n − 1, in zero-coupon bonds which mature at the next date; that is, T j+1 . Definition 2.4 A spot Libor measure, denoted by PL , is a probability measure on (, FT ∗ ) which is equivalent to P, and such that for any j = 0, . . . , n the relative bond price B(t, T j )/G t follows a local martingale under P L . Note that B(t, Tk+1 )/G t =

m(t) 0 j=0

−1 1 + δ j L(T j−1 , T j−1 )

k 0

1 + δ j L(t, T j−1 ) ,

j=m(t)+1

so that all relative bond prices B(t, T j )/G t , j = 0, . . . , n are uniquely determined by a collection of forward Libor rates. In this sense, G is the correct choice of the reference price process in the present setting. We shall now concentrate on the derivation of the dynamics under P L of forward Libor rates L(·, T j ), j = 0, . . . , n − 1. Our aim is to show that these dynamics involve only the volatilities of forward Libor rates (as opposed to volatilities of bond prices or other processes). Therefore, it is possible to define the whole family of forward Libor rates simultaneously under one probability measure (of course, this feature can also be deduced from the preceding construction). To facilitate the derivation of the dynamics of L(·, T j ), we postulate temporarily that bond prices B(t, T j ) follow Itˆo processes under the underlying probability measure P, more explicitly (2.25) d B(t, T j ) = B(t, T j ) a(t, T j ) dt + b(t, T j ) · dWt 9 One may assume, e.g., that bond prices B(t, T ) satisfy the weak no-arbitrage condition, meaning that there j ˜ equivalent to P, and such that all processes B(t, Tk )/B(t, T ∗ ) are P-local ˜ exists a probability measure P,

martingales.

10. Modelling of Forward Libor and Swap Rates

353

for every j = 0, . . . , n, where, as before, W is a d-dimensional standard Brownian motion under an underlying probability measure P (it should be stressed, however, that we do not assume here that P is a forward (or spot) martingale measure). Combining (2.24) with (2.25), we obtain (2.26) dG t = G t a(t, Tm(t) ) dt + b(t, Tm(t) ) · dWt . Furthermore, by applying Itˆo’s rule to the equality 1 + δ j+1 L(t, T j ) =

B(t, T j ) , B(t, T j+1 )

(2.27)

we find that d L(t, T j ) = µ(t, T j ) dt + ζ (t, T j ) · dWt , where µ(t, T j ) =

B(t, T j ) a(t, T j ) − a(t, T j+1 ) − ζ (t, T j )b(t, T j+1 ) δ j+1 B(t, T j+1 )

and ζ (t, T j ) =

B(t, T j ) b(t, T j ) − b(t, T j+1 ) . δ j+1 B(t, T j+1 )

(2.28)

Using (2.27) and the last formula, we arrive at the following relationship: b(t, Tm(t) ) − b(t, T j+1 ) =

j

δ k+1 ζ (t, Tk ) . 1 + δ k+1 L(t, Tk ) k=m(t)

(2.29)

By definition of a spot Libor measure P L , each relative price B(t, T j )/G t follows a local martingale under P L . Since, in addition, P L is assumed to be equivalent to P, it is clear that it is given by the Dol´eans exponential, that is · dP L h u · dWu , P-a.s. = ET ∗ dP 0 for some adapted process h. It it not hard to check, using Itˆo’s rule, that h necessarily satisfies, for t ∈ [0, T j ], a(t, T j ) − a(t, Tm(t) ) = b(t, Tm(t) ) − h t · b(t, T j ) − b(t, Tm(t) ) for every j = 0, . . . , n. Combining (2.28) with the last formula, we obtain B(t, T j ) a(t, T j ) − a(t, T j+1 ) = ζ (t, T j ) · b(t, Tm(t) ) − h t , δ j+1 B(t, T j+1 ) and this in turn yields d L(t, T j ) = ζ (t, T j ) ·

b(t, Tm(t) ) − b(t, T j+1 ) − h t dt + dWt .

354

M. Rutkowski

Using (2.29), we conclude that process L(·, T j ) satisfies d L(t, T j ) =

j δ k+1 ζ (t, Tk ) · ζ (t, T j ) dt + ζ (t, T j ) · dWtL , 1 + δ L(t, T ) k+1 k k=m(t)

t where the process WtL = Wt − 0 h u du follows a d-dimensional standard Brownian motion under the spot Libor measure P L . To further specify the model, we assume that processes ζ (t, T j ), j = 0, . . . , n − 1, have the following form, for t ∈ [0, T j ], ζ (t, T j ) = λ j t, L(t, T j ), L(t, T j+1 ), . . . , L(t, Tn ) , where λ j : [0, T j ] × Rn− j+1 → Rd are given functions. In this way, we obtain a system of SDEs j δ k+1 λk (t, L k (t)) · λ j (t, L j (t)) dt + λ j (t, L j (t)) · dWtL , d L(t, T j ) = 1 + δ L(t, T ) k+1 k k=m(t)

where we write L j (t) = (L(t, T j ), L(t, T j+1 ), . . . , L(t, Tn )). Under mild regularity assumptions, this system can be solved recursively, starting from L(·, Tn−1 ). The lognormal model of forward Libor rates corresponds to the choice of ζ (t, T j ) = λ(t, T j )L(t, T j ), where λ(·, T j ) : [0, T j ] → Rd is a deterministic function for every j.

2.3 Dynamics of Libor rates and bond prices We assume that the volatilities of processes L(·, T j ) follow deterministic functions. Put another way, we place ourselves within the framework of the lognormal model of forward Libor rates. It is interesting to note that in all approaches, there is a uniquely determined correspondence between forward measures (and forward Brownian motions) associated with different dates T0 , . . . , Tn . On the other hand, however, there is a considerable degree of ambiguity in the way in which the spot martingale measure is specified (in some instances, it is not introduced at all). Consequently, the futures Libor rate L f (·, T j ), which equals (cf. Section 2.1.3) ˜ j , T j ) | Ft ), L f (t, T j ) = E P∗ (L(T j , T j ) | Ft ) = E P∗ ( L(T

(2.30)

is not necessarily specified in the same way in various approaches to the lognormal model of forward Libor rates. For this reason, we start by examining the distributional properties of forward Libor rates, which are identical in all abovementioned models. For a given function g : R → R and a fixed date u ≤ T j , we are interested in the following payoff of the form X = g L(u, T j ) which settles at time T j . Particular

10. Modelling of Forward Libor and Swap Rates

355

cases of such payoffs are X 1 = g B −1 (T j , T j+1 ) , X 2 = g B(T j , T j+1 ) , X 3 = g FB (u, T j+1 , T j ) . Recall that ˜ j , T j ) = 1 + δ j+1 L f (T j , T j ). B −1 (T j , T j+1 ) = 1 + δ j+1 L(T j , T j ) = 1 + δ j+1 L(T The choice of the “pricing measure” is thus largely the matter of convenience. Similarly, we have B(T j , T j+1 ) =

1 = FB (T j , T j+1 , T j ). 1 + δ j+1 L(T j , T j )

(2.31)

More generally, the forward price of a T j+1 -maturity bond for the settlement date T j equals B(u, T j+1 ) 1 FB (u, T j+1 , T j ) = = . (2.32) B(u, T j ) 1 + δ j+1 L(u, T j ) ˜ B (u, T j+1 , T j )) Generally speaking, to value the claim X = g(L(u, T j )) = g(F which settles at time T j we may use the formula π t (X ) = B(t, T j )E PT j (X | Ft ),

∀ t ∈ [0, T j ].

It is thus clear that to value a claim in the case u ≤ T j , it is enough to know the dynamics of either L(·, T j ) or FB (·, T j+1 , T j ) under the forward probability measure PT j . If u = T j , we may equally well use the the dynamics, under PT j , of ˜ T j ) or L f (·, T j ). For instance, either L(·, π t (X 1 ) = B(t, T j )E PT j (B −1 (T j , T j+1 ) | Ft ) = B(t, T j )E PT j (FB−1 (T j , T j+1 , T j ) | Ft ), but also

π t (X 1 ) = B(t, T j ) 1 + δ j+1 E PT j (Z (T j ) | Ft ) ,

˜ j , T j ) = L f (T j , T j ). where Z (T j ) = L(T j , T j ) = L(T 2.3.1 Dynamics of L(·, T j ) under PT j We shall now derive the transition probability density function (p.d.f.) of the process L(·, T j ) under the forward probability measure PT j . Let us first prove the following related result, due to Jamshidian (1997). Proposition 2.5 Let t ≤ u ≤ T j . Then

E P T j L(u, T j ) | Ft = L(t, T j ) +

δ j+1 Var PT j+1 L(u, T j ) | Ft 1 + δ j+1 L(t, T j )

.

(2.33)

356

M. Rutkowski

In the case of the lognormal model of Libor rates, we have # 2 $ δ j+1 L(t, T j ) ev j (t,u) − 1 E P T j L(u, T j ) | Ft = L(t, T j ) 1 + , 1 + δ j+1 L(t, T j ) where

v 2j (t, u)

= Var PT j+1

u

λ(s, T j ) ·

T dWs j+1

u

=

t

|λ(s, T j )|2 ds.

(2.34)

(2.35)

t

˜ T j ) satisfies10 In particular, the modified Libor rate L(t, # 2 $ δ j+1 L(t, T j ) ev j (t,T j ) − 1 ˜ T j ) = E PT L(T j , T j ) | Ft = L(t, T j ) 1 + . L(t, j 1 + δ j+1 L(t, T j ) Proof Combining (2.5) with the martingale property of the process L(·, T j ) under PT j+1 , we obtain E PT j+1 (1 + δ j+1 L(u, T j ))L(u, T j ) | Ft E P T j L(u, T j ) | Ft = 1 + δ j+1 L(t, T j ) so that

E P T j L(u, T j ) | Ft = L(t, T j ) +

δ j+1 E P T j+1 (L(u, T j ) − L(t, T j ))2 | Ft 1 + δ j+1 L(t, T j )

.

In the case of the lognormal model, we have 1 2

L(u, T j ) = L(t, T j ) eη j (t,u)− 2 v j (t,u) , where

η j (t, u) =

u

T j+1

λ(s, T j ) dWs

.

(2.36)

t

Consequently,

2 E PT j+1 (L(u, T j ) − L(t, T j ))2 | Ft = L 2 (t, T j ) ev j (t,u) − 1 .

This gives the desired equality (2.34). The last asserted equality is a consequence of (2.6). To derive the transition probability density function (p.d.f.) of the process L(·, T j ), notice that for any t ≤ u ≤ T j , and any bounded Borel measurable function g : R → R we have g(L(u, T E )) 1 + δ L(u, T ) Ft P T j+1 j j+1 j . E P T j g(L(u, T j )) | Ft = 1 + δ j+1 L(t, T j ) 10 This equality can be referred to as the convexity correction.

10. Modelling of Forward Libor and Swap Rates

357

The following simple lemma appears to be useful. Lemma 2.6 Let ζ be a nonnegative random variable on a probability space (, F, P) with the probability density function f P . Let Q be a probability measure equivalent to P. Suppose that for any bounded Borel measurable function g : R → R we have E P (g(ζ )) = E Q (1 + ζ )g(ζ ) . Then the p.d.f. f Q of ζ under Q satisfies f P (y) = (1 + y) f Q (y). Proof The assertion is in fact trivial since, by assumption, ∞ ∞ g(y) f P (y) dy = g(y)(1 + y) f Q (y) dy −∞

−∞

for any bounded Borel measurable function g : R → R. Assume the lognormal model of Libor rates and fix x ∈ R. Recall that for any t ≥ u we have L(u, T j ) = L(t, T j ) e

η j (t,u)− 12 Var P T

j+1

(η j (t,u))

,

where η j (t, u) is given by (2.36) (so that it is independent of the σ -field Ft ). The Markov property of L(·, T j ) under the forward measure PT j+1 is thus apparent. Denote by p L (t, x; u, y) the transition p.d.f. under PT j+1 of the process L(·, T j ). Elementary calculations involving Gaussian densities yield p L (t, x; u, y) = PT j+1 {L(u, T j ) = y | L(t, T j ) = x} " 2 6 ln(y/x) + 12 v 2j (t, u) 1 exp − = √ 2v 2j (t, u) 2πv j (t, u)y for any x, y > 0 and t < u. Taking into account Lemma 2.6, we conclude that the transition p.d.f. of the process11 L(·, T j ), under the forward probability measure PT j , satisfies p˜ L (t, x; u, y) = PT j {L(u, T j ) = y | L(t, T j ) = x} =

1 + δ j+1 y p L (t, x; u, y). 1 + δ j+1 x

We are in a position to state the following result, which can be used, for instance, to value a contingent claim of the form X = h(L(T j )) which settles at time T j (see Schmidt (1996)). 11 The Markov property of L(·, T ) under P can be easily deduced from the Markovian features of the forward j Tj

price FB (·, T j , T j+1 ) under P T j (see formulae (2.37)–(2.38)).

358

M. Rutkowski

Corollary 2.7 The transition p.d.f. under PT j of the forward Libor rate L(·, T j ) equals, for any t < u and x, y > 0, " 2 6 ln(y/x) + 12 v 2j (t, u) 1 + δ j+1 y exp − . p˜ L (t, x; u, y) = √ 2v 2j (t, u) 2π v j (t, u) y(1 + δ j+1 x) 2.3.2 Dynamics of FB (·, T j+1 , T j ) under PT j Observe that the forward bond price FB (·, T j+1 , T j ) satisfies FB (t, T j+1 , T j ) =

B(t, T j+1 ) 1 = . B(t, T j ) 1 + δ j+1 L(t, T j )

(2.37)

First, this implies that in the lognormal model of Libor rates, the dynamics of the forward bond price FB (·, T j+1 , T j ) are governed by the following stochastic differential equation, under PT j , T d FB (t) = −FB (t) 1 − FB (t) λ(t, T j ) · dWt j ,

(2.38)

where we write FB (t) = FB (t, T j+1 , T j ). If the initial condition satisfies 0 < FB (0) < 1, this equation can be shown to admit a unique strong solution (it satisfies 0 < FB (t) < 1 for every t > 0). This makes clear that the process FB (·, T j+1 , T j ) – and thus also the process L(·, T j ) – are Markovian under PT j . Using Corollary 2.7 and relationship (2.37), one can find the transition p.d.f. of the Markov process FB (·, T j+1 , T j ) under PT j ; that is, p B (t, x; u, y) = PT j {FB (u, T j+1 , T j ) = y | FB (t, T j+1 , T j ) = x}. We have the following result (see Rady and Sandmann (1994), Miltersen et al. (1997), and Jamshidian (1997)). Corollary 2.8 The transition p.d.f. under PT j of the forward bond price FB (·, T j+1 , T j ) equals, for any t < u and arbitrary 0 < x, y < 1, 2 x(1−y) 1 2 ln y(1−x) + 2 v j (t, u) x p B (t, x; u, y) = √ exp − . 2v 2j (t, u) 2πv j (t, u)y 2 (1 − y) Proof Let us fix x ∈ (0, 1). Using (2.37), it is easy to show that 1−x 1−y −1 −2 (t, x; u, y) = δ y p ˜ pB ; u, , L t, δx δy where δ = δ j+1 . The formula now follows from Corollary 2.7.

10. Modelling of Forward Libor and Swap Rates

359

Let us observe that the results of this section can be applied to value the so-called irregular cash flows, such as caps or floors settled in advance (for more details on this issue we refer to Schmidt (1996)).

2.4 Caps and floors An interest rate cap (known also as a ceiling rate agreement) is a contractual arrangement where the grantor (seller) has an obligation to pay cash to the holder (buyer) if a particular interest rate exceeds a mutually agreed level at some future date or dates. Similarly, in an interest rate floor, the grantor has an obligation to pay cash to the holder if the interest rate is below a preassigned level. When cash is paid to the holder, the holder’s net position is equivalent to borrowing (or depositing) at a rate fixed at that agreed level. This assumes that the holder of a cap (or floor) agreement also holds an underlying asset (such as a deposit) or an underlying liability (such as a loan). Finally, the holder is not affected by the agreement if the interest rate is ultimately more favorable to him than the agreed level. This feature of a cap (or floor) agreement makes it similar to an option. Specifically, a forward start cap (or a forward start floor) is a strip of caplets (floorlets), each of which is a call (put) option on a forward rate, respectively. Let us denote by κ and by δ j the cap strike rate and the length of the accrual period, respectively. We shall check that an interest rate caplet (i.e., one leg of a cap) may also be seen as a put option with strike price 1 (per dollar of notional principal) which expires at the caplet start day on a discount bond with face value 1 + κδ j which matures at the caplet end date. Similarly to swap agreements, interest rate caps and floors may be settled either in arrears or in advance. In a forward cap or floor, which starts at time T0 , and is settled in arrears at dates T j , j = 1, . . . , n, the cash flows at times T j are N p (L(T j−1 ) − κ)+ δ j and N p (κ − L(T j−1 ))+ δ j , respectively, where N p stands for the notional principal (recall that δ j = T j − T j−1 ). As usual, the rate L(T j−1 ) = L(T j−1 , T j−1 ) is determined at the reset date T j−1 , and it satisfies B(T j−1 , T j )−1 = 1 + δ j L(T j−1 ).

(2.39)

The price at time t ≤ T0 of a forward cap, denoted by FCt , is (we set N p = 1) n Bt FCt = E P∗ (L(T j−1 ) − κ)+ δ j Ft B Tj j=1 n (2.40) = B(t, T j ) E PT j (L(T j−1 ) − κ)+ δ j Ft . j=1

On the other hand, since the cash flow of the j th caplet at time T j is manifestly an

360

M. Rutkowski

FT j−1 -measurable random variable, we may directly express the value of the cap in terms of expectations under forward measures PT j−1 , j = 1, . . . , n. Indeed, we have n FCt = (2.41) B(t, T j−1 ) E PT j−1 B(T j−1 , T j )(L(T j−1 ) − κ)+ δ j Ft . j=1

Consequently, using (2.39) we get the equality FCt =

n

B(t, T j−1 ) E PT j−1

+ 1 − δ˜ j B(T j−1 , T j ) Ft ,

(2.42)

j=1

which is valid for every t ∈ [0, T ]. It is apparent that a caplet is essentially equivalent to a put option on a zero-coupon bond; it may also be seen as an option on a single-period swap. The equivalence of a cap and a put option on a zero-coupon bond can be explained in an intuitive way. For this purpose, it is enough to examine two basic features of both contracts: the exercise set and the payoff value. Let us consider the j th caplet. A caplet is exercised at time T j−1 if and only if L(T j−1 ) − κ > 0, or, equivalently, if B(T j−1 , T j )−1 = 1 + L(T j−1 )(T j − T j−1 ) > 1 + κδ j = δ˜ j . The last inequality holds whenever δ˜ j B(T j−1 , T j ) < 1. This shows that both of the considered options are exercised in the same circumstances. If exercised, the caplet pays δ j (L(T j−1 ) − κ) at time T j , or equivalently −1 δ j B(T j−1 , T j )(L(T j−1 ) − κ) = 1 − δ˜ j B(T j−1 , T j ) = δ˜ j δ˜ j − B(T j−1 , T j ) at time T j−1 . This shows once again that the j th caplet, with strike level κ and nominal value 1, is essentially equivalent to a put option with strike price (1 + κδ j )−1 and nominal value δ˜ j = (1+κδ j ) written on the corresponding zero-coupon bond with maturity T j . The analysis of a floor contract can be done along similar lines. By definition, the j th floorlet pays (κ − L(T j−1 ))+ at time T j . Therefore, n Bt + E P∗ (κ − L(T j−1 )) δ j Ft , (2.43) FFt = BT j j=1 but also FFt =

n j=1

B(t, T j−1 ) E PT j−1

+ 1 − δ˜ j B(T j−1 , T j ) Ft .

(2.44)

10. Modelling of Forward Libor and Swap Rates

361

Combining (2.40) with (2.43) (or (2.42) with (2.44)), we obtain the following cap– floor parity relationship FCt − FFt =

n

B(t, T j−1 ) − δ˜ j B(t, T j ) ,

(2.45)

j=1

which is also an immediate consequence of the no-arbitrage property, so that it does not depend on the model’s choice. 2.4.1 Market valuation formula for caps and floors The main motivation for the introduction of a lognormal model of Libor rates was the market practice of pricing caps and swaptions by means of Black–Scholes-like formulae. For this reason, we shall first describe how market practitioners value caps. The formulae commonly used by practitioners assume that the underlying instrument follows a geometric Brownian motion under some probability measure, Q say. Since the formal definition of this probability measure is not available, we shall informally refer to Q as the market probability. Let us consider an interest rate cap with expiry date T and fixed strike level κ. Market practice is to price the option assuming that the underlying forward interest rate process is lognormally distributed with zero drift. Let us first consider a caplet – that is, one leg of a cap. Assume that the forward Libor rate L(t, T ), t ∈ [0, T ], for the accrual period of length δ follows a geometric Brownian motion under the “market probability”, Q say. More specifically, d L(t, T ) = L(t, T )σ dWt ,

(2.46)

where W follows a one-dimensional standard Brownian motion under Q, and σ is a strictly positive constant. The unique solution of (2.46) is L(t, T ) = L(0, T ) exp σ Wt − 12 σ 2 t 2 , ∀ t ∈ [0, T ], (2.47) where the initial condition is derived from the yield curve Y (0, T ), namely 1 + δL(0, T ) =

B(0, T ) = exp (T + δ)Y (0, T + δ) − T Y (0, T ) . B(0, T + δ)

The “market price” at time t of a caplet with expiry date T and strike level κ is calculated by means of the formula FC t = δ B(t, T + δ) E Q (L(T, T ) − κ)+ Ft . More explicitly, for any t ∈ [0, T ] we have FC t = δ B(t, T + δ) L(t, T )N eˆ1 (t, T ) − κ N eˆ2 (t, T ) ,

(2.48)

362

M. Rutkowski

where N is the standard Gaussian cumulative distribution function x 1 2 N (x) = √ e−z /2 dz, ∀ x ∈ R, 2π −∞ and eˆ1,2 (t, T ) =

ln(L(t, T )/κ) ± 12 vˆ02 (t, T ) vˆ 0 (t, T )

with vˆ 02 (t, T ) = σ 2 (T − t). This means that market practitioners price caplets using Black’s formula, with discount from the settlement date T + δ. A cap settled in arrears at times T j , j = 1, . . . , n, where T j − T j−1 = δ j , T0 = T , is priced by the formula n j j FCt = δ j B(t, T j ) L(t, T j−1 )N eˆ1 (t) − κ N eˆ2 (t) , (2.49) j=1

where for every j = 0, . . . , n − 1 j

eˆ1,2 (t) =

ln(L(t, T j−1 )/κ) ± 12 vˆ 2j (t)

(2.50)

vˆ j (t)

and vˆ 2j (t) = (T j−1 − t)σ 2j for some constants σ j , j = 1, . . . , n. Apparently, the market assumes that for any maturity T j , the corresponding forward Libor rate has a lognormal probability law under the “market probability”. The value of a floor can be easily derived by combining (2.49)–(2.50) with the cap–floor parity relationship (2.45). As we shall see in what follows, the valuation formulae obtained for caps and floors in the lognormal model of forward Libor rates agree with the market practice. 2.4.2 Valuation in the lognormal model of forward Libor rates We shall now examine the valuation of caps within the lognormal model of forward Libor rates of Section 2.2.3. The dynamics of the forward Libor rate L(t, T j−1 ) under the forward probability measure PT j are T

d L(t, T j−1 ) = L(t, T j−1 ) λ(t, T j−1 ) · dWt j ,

(2.51)

where W T j follows a d-dimensional Brownian motion under the forward measure PT j , and λ(·, T j−1 ) : [0, T j−1 ] → Rd is a deterministic function. Consequently, for every t ∈ [0, T j−1 ] we have · Tj λ(u, T j−1 ) · dWu . L(t, T j−1 ) = L(0, T j−1 )Et 0

In the present setup, the cap valuation formula (2.52) was first established by Miltersen et al. (1997), who focused on the dynamics of the forward Libor rate

10. Modelling of Forward Libor and Swap Rates

363

for a given date. Equality (2.52) was subsequently rederived through a probabilistic approach in Goldys (1997) and Rady (1997). Finally, the same result was established by means of the forward measure approach in Brace et al. (1997). The following proposition is a consequence of formula (2.41), combined with the dynamics (2.51). As before, N is the standard Gaussian probability distribution function. Proposition 2.9 Consider an interest rate cap with strike level κ, settled in arrears at times T j , j = 1, . . . , n. Assuming the lognormal model of Libor rates, the price of a cap at time t ∈ [0, T ] equals FCt =

n

δ j B(t, T j ) L(t, T j−1 )N

j e˜1 (t)

− κN

j e˜2 (t)

j=1

=

n

j

FC t ,

(2.52)

j=1

j

where FC t stands for the price at time t of the j th caplet for j = 1, . . . , n, j e˜1,2 (t)

=

ln(L(t, T j−1 )/κ) ± 12 v˜ 2j (t)

and

v˜ j (t)

T j−1

v˜ 2j (t) =

|λ(u, T j−1 )|2 du.

t

Proof We fix j and we consider the j th caplet. It is clear that its payoff at time T j admits the representation FC T j = δ j (L(T j−1 ) − κ)+ = δ j L(T j−1 ) 11 D − δ j κ 11 D , j

(2.53)

where D = {L(T j−1 ) > K } is the exercise set. Since the caplet settles at time T j , it is convenient to use the forward measure PT j to find its arbitrage price. We have j j FC t = B(t, T j )E PT j FC T j | Ft ), ∀ t ∈ [0, T j ]. Obviously, it is enough to find the value of a caplet for t ∈ [0, T j−1 ]. In view of (2.53), it is clear that we need to evaluate the following conditional expectations: j FC t = δ j B(t, T j ) E PT j L(T j−1 ) 11 D Ft − κδ j B(t, T j ) PT j (D-Ft ) = δ j B(t, T j )(I1 − I2 ), where the meaning of I1 and I2 is obvious from the context. Recall that L(T j−1 ) is given by the formula T j−1 1 T j−1 Tj 2 λ(u, T j−1 ) · d Wu − |λ(u, T j−1 )| du . L(T j−1 ) = L(t, T j−1 ) exp 2 t t

364

M. Rutkowski

Since λ(·, T j−1 ) is a deterministic function, the probability law under PT j of the Itˆo integral T j−1 T λ(u, T j−1 ) · dWu j ζ (t, T j−1 ) = t

is Gaussian, with zero mean and the variance T j−1 |λ(u, T j−1 )|2 du. Var PT j (ζ (t, T j−1 )) = t

Therefore, it is straightforward to show that12 $ # ln L(t, T j−1 ) − ln κ − 12 v 2j (t) . I2 = κ N v j (t) To evaluate I1 , we introduce an auxiliary probability measure Pˆ T j , equivalent to PT j on (, FT j−1 ), by setting d Pˆ T j = ET j−1 dPT j

·

λ(u, T j−1 ) ·

T dWu j

.

0

Then the process Wˆ T j given by the formula t Tj Tj ˆ λ(u, T j−1 ) du, Wt = Wt −

∀ t ∈ [0, T j−1 ],

0

follows the d-dimensional standard Brownian motion under Pˆ T j . Furthermore, the forward price L(T j−1 ) admits the representation under Pˆ T j , for t ∈ [0, T j−1 ], T j−1 1 T j−1 T L(T j−1 ) = L(t, T j−1 ) exp λ j−1 (u) · d Wˆ u j + |λ j−1 (u)|2 du 2 t t where we set λ j−1 (u) = λ(u, T j−1 ). Since T j−1 1 T j−1 T λ j−1 (u)·dWu j − |λ j−1 (u)|2 du Ft I1 = L(t, T j−1 )E PT j 11 D exp 2 t t from the abstract Bayes rule, we get I1 = L(t, T j−1 ) Pˆ T j (D | Ft ). Arguing in much the same way as for I2 , we thus obtain # $ ln L(t, T j−1 ) − ln κ + 12 v 2j (t) I1 = L(t, T j−1 ) N . v j (t) This completes the proof of the proposition. 12 See, for instance, the proof of the Black–Scholes formula in Musiela and Rutkowski (1997a).

10. Modelling of Forward Libor and Swap Rates

365

Once again, to derive the floors valuation formula, it is enough to make use of the cap–floor parity (2.45). 2.4.3 Hedging of caps and floors It is clear that the replicating strategy for a cap is a simple sum of replicating strategies for caplets. Therefore, it is enough to focus on a particular caplet. Let us denote by FC (t, T j ) the forward price of the j th caplet for the settlement date T j . From (2.52), it is clear that j j FC (t, T j ) = δ j L(t, T j−1 )N e˜1 (t) − κ N e˜2 (t) , so that an application of Itˆo’s formula yields13 j d FC (t, T j ) = δ j N e˜1 (t) d L(t, T j−1 ).

(2.54)

Let us consider the following self-financing trading strategy in the T j -forward mar14 ket. We start our trade at time 0 with F Cj(0, T j ) units of zero-coupon bonds. At j any time t ≤ T j−1 we assume ψ t = N e˜1 (t) positions in forward rate agreements (that is, single-period forward swaps) over the period [T j−1 , T j ]. The associated gains/losses process V , in the T j forward market,15 satisfies16 j j d Vt = δ j ψ t d L(t, T j−1 ) = δ j N e˜1 (t) d L(t, T j−1 ) = d FC (t, T ) with V0 = 0. Consequently,

T j−1

FC (T j−1 , T j ) = FC (0, T j ) +

j

δ j ψ t d L(t, T j−1 ) = FC (0, T j ) + VT j−1 .

0

It should be stressed that dynamic trading takes place on the interval [0, T j−1 ] only, the gains/losses (involving the initial investment) are incurred at time T j , however. All quantities in the last formula are expressed in units of T j -maturity zero-coupon bonds. Also, the caplet’s payoff is known already at time T j−1 , so that it is j completely specified by its forward price FC (T j−1 , T j ) = FC T j−1 /B(T j−1 , T j ). Therefore the last equality makes it clear that the strategy ψ introduced above does indeed replicate the j th caplet. It should be observed that formally the replicating strategy has also second comj ponent, ηt say, which represents the number of forward contracts on a T j -maturity bond, with the settlement date T j . Since obviously FB (t, T j , T j ) = 1 for every t ≤ T j , so that d FB (t, T j , T j ) = 0, for the T j -forward value of our strategy, we get 13 The calculations here are essentially the same as in the classic Black–Scholes model. 14 We need thus to invest FC j = F (0, T )B(0, T ) of cash at time 0. C j j 0 15 That is, with the value expressed in units of T -maturity zero-coupon bonds. j 16 To get a more intuitive insight in this formula, it is advisable to consider first a discretized version of ψ.

366

M. Rutkowski

j V˜t (ψ j , η j ) = ηt = FC (t, T j ) and

j j j d V˜t (ψ j , η j ) = ψ t δ j d L(t, T j−1 ) + ηt d FB (t, T j , T j ) = δ j N e˜1 (t) d L(t, T j−1 ).

It should be stressed, however, with the exception for the initial investment at time 0 in T j -maturity bonds, no bonds trading is required for the caplet’s replication. In practical terms, the hedging of a cap within the framework of the lognormal model of forward Libor rates in done exclusively through dynamic trading in the underlying single-period swaps. Of course, the same remarks (and similar calculations) apply also to floors. In this interpretation, the component η j simply represents the future (i.e., as of time T j−1 ) effects of a continuous trading in forward contracts. Alternatively, the hedging of a cap can be done in the spot (i.e., cash) market, using two simple portfolios of bonds. Indeed, it is easily seen that for the process Vt (ψ j , η j ) = B(t, T j−1 )V˜t (ψ j , η j ) = FC t

j

we have

j j Vt (ψ j , η j ) = ψ t B(t, T j−1 ) − B(t, T j ) + ηt d FB (t, T j , T j )

and j j d Vt (ψ j , η j ) = ψ t d B(t, T j−1 ) − B(t, T j ) + ηt d B(t, T j ) j j = N e˜1 (t) d B(t, T j−1 ) − B(t, T j ) + ηt d B(t, T j ). This means that the components ψ j and η j now represent the number of units of portfolios B(t, T j−1 ) − B(t, T j ) and B(t, T j ) held at time t. 2.4.4 Bond options We shall now give the bond option valuation formula within the framework of the lognormal model of forward Libor rates. This result was first obtained by Rady and Sandmann (1994), who adopted the PDE approach and who worked in a different setup (see also Goldys (1997), Miltersen et al. (1997), and Rady (1997)). In the present framework, it is an immediate consequence of (2.52) combined with (2.42). Proposition 2.10 The price Ct at time t ≤ T j−1 of a European call option, with expiration date T j−1 and strike price 0 < K < 1, written on a zero-coupon bond maturing at T j = T j−1 + δ j , equals j j Ct = (1 − K )B(t, T j )N l1 (t) − K (B(t, T j−1 ) − B(t, T j ))N l2 (t) , (2.55) where j l1,2 (t)

ln((1 − K )B(t, T j )) − ln K B(t, T j−1 ) − B(t, T j ) ± 12 v˜ j (t) = v˜ j (t)

10. Modelling of Forward Libor and Swap Rates

and

v˜ 2j (t)

=

T j−1

367

|λ(u, T j−1 )|2 du.

t

In view of (2.55), it is apparent that the replication of the bond option using the underlying bonds of maturity T j−1 and T j is rather involved. This should be contrasted with the case of the Gaussian Heath–Jarrow–Morton model17 in which hedging of bond options with the use of the underlying bonds is straightforward. This illustrates the general feature that each particular way of modelling the term structure is tailored to the specific class of derivatives and hedging instruments. 3 Modelling of forward swap rates We shall first describe the most typical swap contracts and related options (the so-called swaptions). Subsequently, we shall present a model of forward swap rates put forward by Jamshidian (1996, 1997). For the sake of expositional convenience, we shall follow the backward induction approach due to Rutkowski (1999), however. 3.1 Interest rate swaps Let us consider a forward (start) payer swap (that is, fixed-for-floating interest rate swap) settled in arrears, with notional principal N p . As before, we consider a finite collection of dates 0 < T0 < T1 < · · · < Tn so that δ j = T j − T j−1 > 0 for every j = 1, . . . , n. The floating rate L(T j−1 ) received at time T j is set at time T j−1 by reference to the price of a zero-coupon bond over the period [T j−1 , T j ]. More specifically, L(T j−1 ) is the spot Libor rate prevailing at time T j−1 , so that it satisfies B(T j−1 , T j )−1 = 1 + (T j − T j−1 )L(T j−1 ) = 1 + δ j L(T j−1 ).

(3.1)

Recall that in general, the forward Libor rate L(t, T j−1 ) for the future time period [T j−1 , T j ] of length δ j satisfies 1 + δ j L(t, T j−1 ) =

B(t, T j−1 ) = FB (t, T j−1 , T j ), B(t, T j )

(3.2)

so that L(T j−1 ) coincides with L(T j−1 , T j−1 ). At any date T j , j = 1, . . . , n, the cash flows of a forward payer swap are N p L(T j−1 )δ j and −N p κδ j , where κ is a preassigned fixed rate of interest (the cash flows of a forward receiver swap have the same size, but opposite signs). The number n, which coincides with the number of payments, is referred to as the length of a swap, (for instance, the length of a 17 In such a model the forward prices of bonds follow lognormal processes.

368

M. Rutkowski

three-year swap with quarterly settlement equals n = 12). The dates T0 , . . . , Tn−1 are known as reset dates, and the dates T1 , . . . , Tn as settlement dates. We shall refer to the first reset date T0 as the start date of a swap. Finally, the time interval [T j−1 , T j ] is referred to as the j th accrual period. We may and do assume, without loss of generality, that the notional principal N p = 1. The value at time t of a forward payer swap, which is denoted by FS t or FS t (κ), equals n Bt FS t (κ) = E P∗ (L(T j−1 ) − κ)δ j Ft . (3.3) BT j j=1 Since L(t, T j−1 ) =

B(t, T j−1 ) − B(t, T j ) , δ j B(t, T j )

it is clear that the process L(·, T j−1 ) follows a martingale under the forward martingale measure PT j . Therefore FS t (κ) =

n

B(t, T j )E PT j (L(T j−1 ) − κ)δ j Ft

j=1

=

n

B(t, T j ) (L(t, T j−1 ) − κ)δ j

j=1

=

n

B(t, T j−1 ) − B(t, T j ) − κδ j B(t, T j ) .

j=1

After rearranging, this yields FS t (κ) = B(t, T0 ) −

n

c j B(t, T j )

(3.4)

j=1

for every t ∈ [0, T ], where c j = κδ j for j = 1, . . . , n − 1, and cn = δ˜ n = 1 + κδ n . The last equality makes clear that a forward payer swap settled in arrears is, essentially, a contract to deliver a specific coupon-bearing bond and to receive at the same time a zero-coupon bond. Relationship (3.4) may also be established through a straightforward comparison of the future cash flows from these bonds. Note that (3.4) provides a simple method for the replication of a swap contract, independent of the term structure model. In the forward payer swap settled in advance – that is, in which each reset date is also a settlement date – the discounting method varies from country to country. In the U.S. and in many European markets, the cash flows of a swap settled in advance at reset dates T j , j = 0, . . . , n − 1, are L(T j )δ j+1 (1 + L(T j )δ j+1 )−1 and

10. Modelling of Forward Libor and Swap Rates

369

−κδ j+1 (1 + L(T j )δ j+1 )−1 . Therefore the value FS ∗∗ t (κ) at time t of this swap is

n−1 Bt δ j+1 (L(T j ) − κ) ∗∗ FS t (κ) = E P∗ Ft BT j 1 + δ j+1 L(T j ) j=0 n−1 Bt = E P∗ (L(T j ) − κ)δ j+1 B(T j , T j+1 ) Ft B Tj j=0 n−1 Bt = E P∗ (L(T j ) − κ)δ j+1 Ft , BT j+1 j=0 which coincides with the value of the swap settled in arrears. Once again, this is by no means surprising, since the payoffs L(T j )δ j+1 (1 + L(T j )δ j+1 )−1 and −κδ j+1 (1 + L(T j )δ j+1 )−1 at time T j are easily seen to be equivalent to payoffs L(T j )δ j+1 and −κδ j+1 respectively at time T j+1 (recall that 1 + L(T j )δ j+1 = B −1 (T j , T j+1 )). In what follows, we shall restrict our attention to interest rate swaps settled in arrears. As mentioned, a swap agreement is worthless at initiation. This important feature of a swap leads to the following definition, which refers in fact to the more general concept of a forward swap. Basically, a forward swap rate is that fixed rate of interest which makes a forward swap worthless. Definition 3.1 The forward swap rate κ(t, T0 , n) at time t for the date T0 is that value of the fixed rate κ which makes the value of the forward swap zero, i.e., that value of κ for which FS t (κ) = 0. Using (3.4), we obtain −1 n κ(t, T0 , n) = (B(t, T0 ) − B(t, Tn )) δ j B(t, T j ) . (3.5) j=1

A swap (swap rate, respectively) is the forward swap (forward swap rate, respectively) with t = T . The swap rate, κ(T0 , T0 , n), equals −1 n κ(T0 , T0 , n) = (1 − B(T0 , Tn )) δ j B(T0 , T j ) . (3.6) j=1

Note that the definition of a forward swap rate implicitly refers to a swap contract of length n which starts at time T0 . It would thus be more correct to refer to κ(t, T0 , n) as the n-period forward swap rate prevailing at time t, for the future date T0 . A forward swap rate is a rather theoretical concept, as opposed to swap rates, which are quoted daily (subject to an appropriate bid–ask spread) by financial institutions who offer interest rate swap contracts to their institutional clients. In practice, swap agreements of various lengths are offered. Also, typically, the length of the reference period varies over time; for instance, a five-year swap may be

370

M. Rutkowski

settled quarterly during the first three years, and semi-annually during the last two. Swap rates also play an important role as a basis for several derivative instruments. For instance, an appropriate swap rate is commonly used as a strike level for an option written on the value of a swap; that is, a swaption. Finally, it will be useful to express that value at time t of a given forward swap with fixed rate κ in terms of the current value of the forward swap rate. Since obviously FS t (κ(t, T0 , n)) = 0, using (3.4), we get FS t (κ) = FS t (κ) − FS t (κ(t, T0 , n)) =

n

(κ(t, T0 , n) − κ)B(t, T j ).

(3.7)

j=1

3.2 The lognormal model of forward swap rates The lognormal model of forward swap rates was developed by Jamshidian (1996, 1997). In this section, we follow Rutkowski (1999). We assume, as before, that the tenor structure 0 < T0 < T1 < · · · < Tn = T ∗ is given. Recall that δ j = T j − T j−1 j for j = 1, . . . , n, and thus T j = i=0 δi for every j = 0, . . . , n. For any fixed j, we consider a fixed-for-floating forward (payer) swap which starts at time T j and has n − j accrual periods, whose consecutive lengths are δ j+1 , . . . , δ n . The fixed interest rate paid at each of the reset dates Tl for l = j + 1, . . . , n equals κ, and the corresponding floating rate, L(Tl ), is found using the formula B(Tl , Tl+1 )−1 = 1 + (Tl+1 − Tl )L(Tl ) = 1 + δl+1 L(Tl ), i.e., it coincides with the Libor rate L(Tl , Tl ). It is not difficult to check, using no-arbitrage arguments, that the value of such a swap equals, for t ∈ [0, T j ] (by convention, the notional principal equals 1) FS t (κ) = B(t, T j ) −

n

cl B(t, Tl ),

l= j+1

where cl = κδl for l = j + 1, . . . , n − 1, and cn = 1 + κδ n . Consequently, the associated forward swap rate, κ(t, T j , n − j), that is, that value of a fixed rate κ for which such a swap is worthless at time t, is given by the formula κ(t, T j , n − j) =

B(t, T j ) − B(t, Tn ) δ j+1 B(t, T j+1 ) + · · · + δ n B(t, Tn )

(3.8)

for every t ∈ [0, T j ], j = 0, . . . , n − 1. In this section, we consider the family of forward swap rates κ(t, ˜ T j ) = κ(t, T j , n − j) for j = 0, . . . , n − 1. Let us stress that the underlying swap agreements differ in length, however, they all have a common expiration date, T ∗ = Tn . Suppose momentarily that we are given a family of bond prices B(t, Tm ), m = 1, . . . , n, on a filtered probability space (, F, P) equipped with a Brownian

10. Modelling of Forward Libor and Swap Rates

371

motion W . As in Section 2.1, we find it convenient to postulate that P = PT ∗ is the ∗ forward measure for the date T ∗ , and the process W = W T is the corresponding Brownian motion. For any m = 1, . . . , n − 1, we introduce the fixed-maturity coupon process G(m) by setting (recall that Tl∗ = Tn−l , in particular, T0∗ = Tn ) G t (m) =

n l=n−m+1

δl B(t, Tl ) =

m−1

δ n−k B(t, Tk∗ )

(3.9)

k=0

for t ∈ [0, Tn−m+1 ].A forward swap measure is that probability measure, equivalent to P, which corresponds to the choice of the fixed-maturity coupon process as a numeraire asset. We have the following definition. Definition 3.2 For j = 0, . . . , n, a probability measure P˜ T j on (, FT j ), equivalent to P, is said to be the fixed-maturity forward swap measure for the date T j if, for every k = 0, . . . , n, the relative bond price Z n− j+1 (t, Tk ) :=

B(t, Tk ) B(t, Tk ) = , G t (n − j + 1) δ j B(t, T j ) + · · · + δ n B(t, Tn )

t ∈ [0, Tk ∧ T j ], follows a local martingale under P˜ T j . Put another way, for any fixed m = 1, . . . , n + 1, the relative bond prices Z m (t, Tk∗ ) =

B(t, Tk∗ ) B(t, Tk∗ ) = , ∗ G t (m) δ n−m+1 B(t, Tm−1 ) + · · · + δ n B(t, T ∗ )

∗ t ∈ [0, Tk∗ ∧ Tm−1 ], are bound to follow local martingales under the forward swap ˜ ∗ . It follows immediately from (3.8) that the forward swap rate for measure PTm−1 the date Tm∗ equals, for t ∈ [0, Tm∗ ],

κ(t, ˜ Tm∗ ) =

B(t, Tm∗ ) − B(t, T ∗ ) , ∗ δ n−m+1 B(t, Tm−1 ) + · · · + δ n B(t, T ∗ )

or, equivalently, κ(t, ˜ Tm∗ ) = Z m (t, Tm∗ ) − Z m (t, T ∗ ). Therefore κ(·, ˜ Tm∗ ) also follows a local martingale under the forward swap mea∗ . Moreover, since obviously G t (1) = δ n B(t, T ∗ ), it is evident that sure P˜ Tm−1 ∗ ∗ ˜ Z 1 (t, Tk∗ ) = δ −1 n FB (t, Tk , T ), and thus the probability measure PT ∗ can be chosen to coincide with the forward martingale measure PT ∗ . Our aim is to construct a model of forward swap rates through backward induction. As one might expect, the underlying bond price processes will not be explicitly specified. We make the following standing assumptions.

372

M. Rutkowski

Assumptions (SR) We assume that we are given a family of bounded adapted processes ν(·, T j ), j = 0, . . . , n − 1, which represent the volatilities of forward swap rates κ(·, ˜ T j ). In addition, we are given an initial term structure of interest rates, specified by a family B(0, T j ), j = 0, . . . , n, of bond prices. We assume that B(0, T j ) > B(0, T j+1 ) for j = 0, . . . , n − 1. We wish to construct a family of forward swap rates in such a way that ˜ T j )ν(t, T j ) · d W˜ t dτ κ(t, T j ) = κ(t,

T j+1

(3.10)

for any j = 0, . . . , n − 1, where each process W˜ T j+1 follows a standard Brownian motion under the corresponding forward swap measure P˜ T j+1 . The model should also be consistent with the initial term structure of interest rates, meaning that κ(0, ˜ Tj ) =

B(0, T j ) − B(0, T ∗ ) . δ j+1 B(0, T j+1 ) + · · · + δ n B(0, Tn )

(3.11)

We proceed by backward induction. The first step is to introduce the forward swap ˜ T1∗ ) solves the rate for the date T1∗ by postulating that the forward swap rate κ(·, SDE ∗

˜ T1∗ )ν(t, T1∗ ) · dτ WtT , dτ κ(t, T1∗ ) = κ(t,

∀ t ∈ [0, T1∗ ],

(3.12)

where W˜ T = W T = W , with the initial condition ∗

∗

κ(0, ˜ T1∗ ) =

B(0, T1∗ ) − B(0, T ∗ ) . δ n B(0, T ∗ )

To specify the process κ(·, ˜ T2∗ ), we need first to introduce a forward swap measure ˜PT ∗ and an associated Brownian motion W˜ T1∗ . To this end, notice that each process 1 Z 1 (·, Tk∗ ) = B(·, Tk∗ )/δ n B(·, T ∗ ), follows a strictly positive local martingale under P˜ T ∗ = PT ∗ . More specifically, we have d Z 1 (t, Tk∗ ) = Z 1 (t, Tk∗ )γ 1 (t, Tk∗ ) · dτ WtT

∗

(3.13)

for some adapted process γ 1 (·, Tk∗ ). According to the definition of a fixed-maturity forward swap measure, we postulate that for every k the process Z 2 (t, Tk∗ ) =

Z 1 (t, Tk∗ ) B(t, Tk∗ ) = δ n−1 B(t, T1∗ ) + δ n B(t, T ∗ ) 1 + δ n−1 Z 1 (t, T1∗ )

follows a local martingale under P˜ T1∗ . Applying Lemma 2.3 to processes G = Z 1 (·, Tk∗ ) and H = δ n−1 Z 1 (·, T1∗ ), it is easy to see that for this property to hold, it ∗ suffices to assume that the process W˜ T1 , which is given by the formula t δ n−1 Z 1 (u, T1∗ ) T1∗ T∗ ∗ ˜ ˜ Wt = Wt − ∗ γ 1 (u, T1 ) du, 0 1 + δ n−1 Z 1 (u, T1 )

10. Modelling of Forward Libor and Swap Rates

373

t ∈ [0, T1∗ ], follows a Brownian motion under P˜ T1∗ , (the probability measure P˜ T1∗ is yet unspecified, but will be soon found through Girsanov’s theorem). Note that Z 1 (t, T1∗ ) =

B(t, T1∗ ) ˜ T1∗ ) + δ −1 = κ(t, ˜ T1∗ ) + Z 1 (t, T ∗ ) = κ(t, n . δ n B(t, T ∗ )

Differentiating both sides of the last equality, we get (cf. (3.12) and (3.13)) Z 1 (t, T1∗ )γ 1 (t, T1∗ ) = κ(t, ˜ T1∗ )ν(t, T1∗ ). ∗ Consequently, W˜ T1 is explicitly given by the formula t δ n−1 κ(u, ˜ T1∗ ) T1∗ T∗ ˜ ˜ ν(u, T1∗ ) du Wt = Wt − −1 ˜ T1∗ ) 0 1 + δ n−1 δ n + δ n−1 κ(u,

for t ∈ [0, T1∗ ]. We are in a position to define, using Girsanov’s theorem, the associated forward swap measure P˜ T1∗ . Subsequently, we introduce the process κ(·, ˜ T2∗ ), by postulating that it solves the SDE ∗

T dτ κ(t, T2∗ ) = κ(t, ˜ T2∗ )ν(t, T2∗ ) · d W˜ t 1

with the initial condition B(0, T2∗ ) − B(0, T ∗ ) . δ n−1 B(0, T1∗ ) + δ n B(0, T ∗ )

κ(0, ˜ T2∗ ) =

For the reader’s convenience, let us consider one more inductive step, in which we are looking for κ(t, ˜ T3∗ ). We now consider processes Z 3 (t, Tk∗ ) =

B(t, Tk∗ ) Z 2 (t, Tk∗ ) = , δ n−2 B(t, T2∗ ) + δ n−1 B(t, T1∗ ) + δ n B(t, T ∗ ) 1 + δ n−2 Z 2 (t, T2∗ )

so that ∗

∗

T T W˜ t 2 = W˜ t 1 −

t 0

δ n−2 Z 2 (u, T2∗ ) γ (u, T2∗ ) du 1 + δ n−2 Z 2 (u, T2∗ ) 2

for t ∈ [0, T2∗ ]. It is useful to note that Z 2 (t, T2∗ ) =

B(t, T2∗ ) = κ(t, ˜ T2∗ ) + Z 2 (t, T ∗ ), δ n−1 B(t, T1∗ ) + δ n B(t, T ∗ )

where in turn Z 2 (t, T ∗ ) =

Z 1 (t, T ∗ ) 1 + δ n−1 Z 1 (t, T ∗ ) + δ n−1 κ(t, ˜ T1∗ )

and the process Z 1 (·, T ∗ ) is already known from the previous step (clearly, Z 1 (·, T ∗ ) = 1/dn ). Differentiating the last equality, we may thus find the volatility of the process Z 2 (·, T ∗ ), and consequently, define P˜ T2∗ .

374

M. Rutkowski

We now examine the general case. We proceed by induction with respect to m. ˜ Tm∗ ), the forward Suppose that we have found forward swap rates κ(·, ˜ T1∗ ), . . . , κ(·, ∗ ∗ swap measure P˜ Tm−1 and the associated Brownian motion W˜ Tm−1 . Our aim is to ∗ determine the forward swap measure P˜ Tm∗ , the associated Brownian motion W˜ Tm , ∗ ). To this end, we postulate that processes and the forward swap rate κ(·, ˜ Tm+1 Z m+1 (t, Tk∗ ) = =

B(t, Tk∗ ) B(t, Tk∗ ) = G t (m + 1) δ n−m B(t, Tm∗ ) + · · · + δ n B(t, T ∗ ) ∗ Z m (t, Tk ) 1 + δ n−m Z m (t, Tm∗ )

follow local martingales under P˜ Tm∗ . In view of Lemma 2.3, applied to processes G = Z m (·, Tk∗ ) and H = Z m (·, Tm∗ ), it is clear that we may set t ∗ δ n−m Z m (u, Tm∗ ) Tmδ T∗ ˜ ˜ (3.14) Wt = Wt − γ (u, Tm∗ ) du, ∗) m 1 + δ Z (u, T n−m m 0 m for t ∈ [0, Tm∗ ]. Therefore it is sufficient to analyse the process Z m (t, Tm∗ ) =

B(t, Tm∗ ) = κ(t, ˜ Tm∗ ) + Z m (t, T ∗ ). ∗ δ n−m+1 B(t, Tm−1 ) + · · · + δ n B(t, T ∗ )

To conclude, it is enough to notice that Z m (t, T ∗ ) =

Z m−1 (t, T ∗ ) . ∗ 1 + δ n−m+1 Z m−1 (t, T ∗ ) + δ n−m+1 κ(t, ˜ Tm−1 )

Indeed, from the preceding step, we know that the process Z m−1 (·, T ∗ ) is a (ra∗ ˜ Tm−1 ). Consequently, the tional) function of forward swap rates κ(·, ˜ T1∗ ), . . . , κ(·, process under the integral sign on the right-hand side of (3.14) can be expressed ∗ ˜ Tm−1 ) and their volatilities (since the explicit forusing the terms κ(·, ˜ T1∗ ), . . . , κ(·, ∗ mula is rather lengthy, it is not reported here). Having found the process W˜ Tm and ∗ ˜ Tm+1 ) through probability measure P˜ Tm∗ , we introduce the forward swap rate κ(·, (3.10)–(3.11), and so forth. If all volatilities are deterministic, the model is termed the lognormal model of fixed-maturity forward swap rates. 3.3 Valuation of swaptions For a long time, Black’s swaptions formula was merely a (widely used) practical tool to value swaptions. Indeed, the use of this formula was not supported by the existence of a reliable term structure model. Valuation and hedging of swaptions based on the suitable version of Black’s formula was analysed, for instance, in Neuberger (1990). The formal derivation of this heuristic results within the framework of a well established term structure model was first achieved in Jamshidian (1997).

10. Modelling of Forward Libor and Swap Rates

375

3.3.1 Payer and receiver swaptions The owner of a payer (receiver, respectively) swaption with strike rate κ, maturing at time T = T0 , has the right to enter at time T the underlying forward payer (receiver, respectively) swap settled in arrears.18 Because FS T (κ) is the value at time T of the payer swap with the fixed interest rate κ, it is clear that the price of the payer swaption at time t equals

+ Bt PS t = E P∗ FS T (κ) Ft . BT Using (3.3), we obtain n + BT Bt PS t = E P∗ (L(T j−1 ) − κ)δ j FT E P∗ Ft . BT BT j j=1 On the other hand, in view of (3.7) we also have + n Bt BT (κ(T, T, n) − κ)δ j FT E P∗ PS t = E P∗ Ft BT BT j j=1

(3.15)

(3.16)

The last equality yields n + Bt BT E P∗ PS t = E P∗ (κ(T, T, n) − κ)δ j FT Ft BT B Tj j=1 n Bt BT = E P∗ E P∗ (κ(T, T, n) − κ)+ δ j FT Ft BT B Tj j=1

n Bt = E P∗ δ j B(T, T j )E PT j (κ(T, T, n) − κ)+ FT Ft BT j=1 n Bt = E P∗ δ j B(T, T j )(κ(T, T, n) − κ)+ Ft BT j=1 + n Bt = E P∗ c j B(T, T j ) Ft . 1− BT j=1 Similarly, for the receiver swaption, we have

+ Bt −FS T (κ) Ft , RS t = E P∗ BT 18 By convention, the notional principal of the underlying swap (and thus also the notional principal of the swaption) equals N p = 1.

376

M. Rutkowski

that is RS t = E P∗

n + Bt BT E P∗ (κ − L(T j−1 ))δ j FT Ft , BT BT j j=1

(3.17)

where we write RS t to denote the price at time t of a receiver swaption. Consequently, reasoning in much the same way as in the case of a payer swaption, we get n + Bt BT RS t = E P∗ (κ − κ(T, T, n))δ j FT E P∗ Ft BT BT j j=1 n Bt BT + = E P∗ E P∗ (κ − κ(T, T, n)) δ j FT Ft BT BT j j=1 + n Bt = E P∗ c j B(T, T j ) − 1 Ft . BT j=1 We shall first focus on a payer swaption. In view of (3.15), it is apparent that a payer swaption is exercised at time T if and only if the value of the underlying swap is positive at this date. It should be made clear that a swaption may be exercised by its owner only at its maturity date T . If exercised, a swaption gives rise to a sequence of cash flows at prescribed future dates. By considering the future cash flows from a swaption and from the corresponding market swap19 available at time T , it is easily seen that the owner of a swaption is protected against the adverse movements of the swap rate that may occur before time T . Suppose, for instance, that the swap rate at time T is greater than κ. Then by combining the swaption with a market swap, the owner of a swaption with exercise rate κ is entitled to enter at time T , at no additional cost, a swap contract in which the fixed rate is κ. If, on the contrary, the swap rate at time T is less than κ, the swaption is worthless, but its owner is, of course, able to enter a market swap contract based on the current swap rate κ(T, T, n) ≤ κ. Concluding, the fixed rate paid by the owner of a swaption who intends to initiate a swap contract at time T will never be above the preassigned level κ. Notice that we that we have shown, in particular, that n BT Bt + ∗ ∗ PS t = E P EP (κ(T, T, n) − κ) δ j FT Ft . (3.18) BT B Tj j=1 This shows that a payer swaption is essentially equivalent to a sequence of fixed p payments d j = δ j (κ(T, T, n) − κ)+ which are received at settlement dates 19 At any time t, a market swap is that swap whose current value equals zero. Put more explicitly, it is the swap

in which the fixed rate κ equals the current swap rate.

10. Modelling of Forward Libor and Swap Rates

377

T1 , . . . , Tn , but whose value is known already at the expiry date T . In words, a payer swaption can be seen as a specific call option on a forward swap rate, with fixed strike level κ. The exercise date of the option is T , but the payoff takes place at each date T1 , . . . , Tn . This equivalence may also be derived by directly verifying that the future cash flows from the following portfolios established at time T are identical: portfolio A – a swaption and a market swap; and portfolio B – a just described call option on a swap rate and a market swap. Indeed, both portfolios correspond to a payer swap with the fixed rate equal to κ. Finally, the equality PS t = E P∗

+ n Bt c j B(T, T j ) Ft 1− BT j=1

(3.19)

shows that the payer swaption may also be seen as a standard put option on a coupon-bearing bond with the coupon rate κ, with exercise date T and strike price 1. Similar remarks are valid for the receiver swaption. In particular, a receiver swaption can also be viewed as a sequence of put options on a swap rate which are not allowed to be exercised separately. At time T the long party receives the value of a sequence of cash flows, discounted from time T j , j = 1, . . . , n, to the date T , defined by δ j (κ − κ(T, T, n))+ . On the other hand, a receiver swaption may be seen as a call option, with strike price 1 and expiry date T , written on a coupon bond with coupon rate equal to the strike rate κ of the underlying forward swap. Let us finally mention the put–call parity relationship for swaptions. It follows easily from (3.15)–(3.17) that PS t − RS t = FS t , i.e., payer swaption (t) − receiver swaption (t) = forward swap (t) provided that both swaptions expire at the same date T (and have the same contractual features).

3.3.2 Forward swaptions Let us now consider a forward swaption. In this case, we assume that the expiry date Tˆ of the swaption precedes the initiation date T of the underlying payer swap – that is, Tˆ ≤ T . Recall that FS t (κ) =

n κ(t, T, n) − κ B(t, T j ) j=1

378

M. Rutkowski

for t ∈ [0, T ]. It is thus clear that the payoff PS Tˆ at expiry Tˆ of the forward swaption (with strike 0) is either 0, if κ ≥ κ(Tˆ , T, n), or PS Tˆ

n κ(Tˆ , T, n) − κ B(Tˆ , T j ) = j=1

if, on the contrary, inequality κ(Tˆ , T, n) > κ holds. We conclude that the payoff PS Tˆ of the forward swaption can be represented in the following way: PS Tˆ =

n + κ(Tˆ , T, n) − κ B(Tˆ , T j ).

(3.20)

j=1

This means that, if exercised, the forward swaption gives rise to a sequence of equal payments κ(Tˆ , T, n) − κ at each settlement date T1 , . . . , Tn . By substituting Tˆ = T we recover, in a more intuitive way and in a more general setting, the previously observed dual nature of the swaption: it may be seen either as an option on the value of a particular (forward) swap or, equivalently, as an option on the corresponding (forward) swap rate. It is also clear that the owner of a forward swaption is able to enter at time Tˆ (at no additional cost) into a forward payer swap with preassigned fixed interest rate κ. 3.3.3 Valuation in the lognormal model of forward Libor rates Recall that within the general framework, the price at time t ∈ [0, T0 ] of a payer swaption20 with expiry date T = T0 and strike level κ equals n + Bt BT (L(T j−1 ) − κ)δ j FT E P∗ PS t = E P∗ Ft . BT B Tj j=1 Let D ∈ FT be the exercise set of a swaption; that is D = {ω ∈ | (κ(T, T, n) − κ)+ > 0} = {ω ∈ |

n

c j B(T, T j ) < 1}.

j=1

Lemma 3.3 The following equality holds for every t ∈ [0, T ]: n PS t = δ j B(t, T j ) E PT j (L(T, T j−1 ) − κ) I D Ft .

(3.21)

j=1

Proof Since PS t = E P∗

n BT Bt I D E P∗ (L(T j−1 ) − κ)δ j FT Ft , BT BT j j=1

20 Since the relationship PS − RS = FS is always valid, and the value of a forward swap is given by (3.4), t t t

it is enough to examine the case of a payer swaption.

10. Modelling of Forward Libor and Swap Rates

379

we have

PS t

n

Bt E = E (L(T j−1 ) − κ)δ j I D FT Ft BT j j=1 n = B(t, T j ) E PT j (L(T j−1 ) − κ)δ j I D Ft , P∗

P∗

j=1

where L(T j−1 ) = L(T j−1 , T j−1 ). For any j = 1, . . . , n, we have = E PT j E PT j L(T j−1 ) − κ FT I D Ft E P T j (L(T j−1 ) − κ) I D Ft = E PT j (L(T, T j−1 ) − κ) I D Ft , since Ft ⊂ FT and the process L(t, T j−1 ) is a PT j -martingale. For any k = 1, . . . , n, we define the random variable ζ k (t) by setting T ζ k (t) = λ(u, Tk−1 ) · dWuTk , ∀ t ∈ [0, T ],

(3.22)

t

and we write

T

λ2k (t) =

|λ(u, Tk−1 )|2 du,

∀ t ∈ [0, T ].

(3.23)

t

Note that for every k = 1, . . . , n and t ∈ [0, T ], we have L(T, Tk−1 ) = L(t, Tk−1 ) eζ k (t)−λk (t)/2 . 2

Recall also that the processes W Tk satisfy the following relationship: t δ k+1 L(u, Tk ) Tk+1 Tk = Wt + Wt λ(u, Tk ) du 0 1 + δ k+1 L(u, Tk ) for t ∈ [0, Tk ] and k = 0, . . . , n − 1. For ease of notation, we formulate the next result for t = 0 only; a general case can be treated along the same lines. For any fixed j, we denote by G j the joint probability distribution function of the n-dimensional random variable (ζ 1 (0), . . . , ζ n (0)) under the forward measure PT j . Proposition 3.4 Assume the lognormal model of Libor rates. The price at time 0 of a payer swaption with expiry date T = T0 and strike level κ equals n 2 L(0, T j−1 )e y j −λ j (0)/2 − κ I D˜ dG j (y1 , . . . , yn ), δ j B(0, T j ) PS 0 = j=1

Rn

380

M. Rutkowski

where I D˜ = I D˜ (y1 , . . . , yn ), and D˜ stands for the set

j n −1 0 n yk −λ2k (0)/2 ˜ cj 0 is a constant, and D = {VT1 > K VT2 } is the exercise set. It is easy to check using the abstract Bayes rule that the equality V02 VT1 dP1 = , dP2 V01 VT2

P2 -a.s.,

(3.27)

links the martingale measures P1 and P2 associated with the choice of value processes V 1 and V 2 as discount factors, respectively (both probability measures are considered here on (, FT )). Furthermore, the arbitrage price of the option admits the following representation Ct = Vt1 P1 (D | Ft ) − K Vt2 P2 (D | Ft ),

∀ t ∈ [0, T ],

(3.28)

where D = {VT1 > K VT2 }. To obtain the Black–Scholes-like formula for the option’s price Ct , it is enough to assume that the the relative price V 1 /V 2 follows a lognormal martingale under P2 , so that 1,2 d (Vt1 /Vt2 ) = (Vt1 /Vt2 )γ 1,2 t · dWt

(3.29)

for a deterministic function γ 1,2 : [0, T ] → Rd (for simplicity, we also assume that the function γ 1,2 is bounded). In view of (3.27), the Radon–Nikod´ym density of P1 with respect to P2 equals · dP1 1,2 1,2 = ET γ u · dWu , P2 -a.s., (3.30) dP2 0 and thus the process Wt2,1

=

Wt1,2

t

−

γ 1,2 u du,

∀ t ∈ [0, T ],

0

is a standard Brownian motion under P2 . Reasoning in the much the same way as in the proof of the classic Black–Scholes formula (see, for instance, the proof of Theorem 5.1.1 in Musiela and Rutkowski (1997a)), we obtain (3.31) Ct = Vt1 N d1 (t, T ) − K Vt2 N d2 (t, T ) , where d1,2 (t, T ) =

2 (t, T ) ln(Vt1 /Vt2 ) − ln K ± 12 v1,2

v1,2 (t, T )

10. Modelling of Forward Libor and Swap Rates

and

2 (t, T ) v1,2

T

=

2 |γ 1,2 u | du,

385

∀ t ∈ [0, T ].

t

Of course, the caps and swaptions22 valuation formulae in lognormal models described above can be seen as special cases of (3.31). The idea can be, of course, applied to other interest rate derivatives. It is worthwhile noting that in order to get the valuation result (3.31) for t = 0, it is enough to assume that the random variable VT1 /VT2 has a lognormal probability law under the martingale measure P2 . This simple observation underpins the construction of the so-called Markov-functional interest rate models – this alternative approach to term structure modelling is briefly reviewed in the next section. A more straightforward generalization of lognormal models of the term structure was developed by Andersen and Andreasen (1997). In this case, the assumption that the volatility is deterministic is replaced by a suitable functional form of the volatility. The resulting models are capable of handling the so-called volatility skew in observed option prices (empirical studies have shown that the implied volatilities of observed caps and swaptions prices tend to be decreasing functions of the strike level). The main focus in Andersen and Andreasen (1997) is on the use of the CEV process23 as a model of the forward Libor rate. Put more explicitly, they generalize equality (2.20) by postulating that T j+1

d L(t, T j ) = L α (t, T j ) λ(t, T j ) · dWt

,

∀ t ∈ [0, T j ],

where α > 0 is a strictly positive constant. They derive closed-form solutions for caplet prices under the above specification of the dynamics of Libor rates with α = 1, in terms of the cumulative distribution function of a non-central χ 2 probability law. It appears that, depending on the choice of the parameter α, the implied Black’s volatilities of caplet prices, considered as a function of the strike level κ > 0, exhibit downward- or upward-sloping skew. 4 Markov-functional models As shown in Section 2.2.4, the forward Libor or swap24 rates follow a multidimensional Markov process under any of the associated forward measures. In principle, lognormal models can be easily calibrated to market prices of caps (or 22 For the j th caplet, we take V 1 = B(t, T ) − B(t, T 2 th j j+1 ) and Vt = δ j+1 B(t, T j+1 ). In the case of the j t swaption, we have Vt1 = B(t, T j ) − B(t, Tn ) and Vt2 = nk= j+1 δ k B(t, Tk ). 23 In the context of equity options, the CEV (constant elasticity of variance) process was first introduced in Cox

and Ross (1976). 24 The multi-dimensional SDE which governs the dynamics of the family of forward swap rates is more involved

than the SDE for the family of Libor rates, and thus it is not reported here. The interested reader is referred to Jamshidian (1997).

386

M. Rutkowski

swaptions), which is, of course, a nice feature of this class of term structure models, as opposed to the classic models based on the specification of the dynamics of (spot or forward) instantaneous rates. On the other hand, however, due to the high dimensionality of the underlying Markov process, the efficient implementation of these models appears to be rather difficult. To circumvent this obstacle, an alternative approach was recently developed in a series of papers by Hunt and Kennedy (1997, 1998) and Hunt et al. (1996, 2000).25 It is based on the introduction of a low-dimensional Markov process which (by assumption) governs, through a simple functional dependence, the dynamics of all other relevant stochastic processes. For this reason, these class of term structure models is referred to as Markov-functional interest rate models. In economical interpretation, the underlying Markov process is assumed to represent the state of the economy; it is thus justified to refer to its components as “state variables”. Formally, one starts by introducing a one- or multi-dimensional process M, which possesses the Markov property under the terminal measure, where the generic term terminal measure is intended to cover not only cases considered in previous sections, but also other suitable choices of the numeraire portfolio. As already mentioned, the relevant processes, such as in particular the value process of the numeraire portfolio and zero-coupon bond prices, are assumed to be functions of M. For instance, if T ∗ > 0 is the horizon date, than for any t ≤ s ≤ T we have B(s, T, Ms ) B(t, T, Mt ) = E Pˆ Ft , Vt (Mt ) Vs (Ms ) where Vt (Mt ), t ≤ T ∗ , is the value process of the numeraire portfolio, and Pˆ is the associated martingale measure. The notation B(t, T, Mt ) emphasizes the direct dependence of the bond price on time variables, t and T , as well as on the state variable represented by the random variable Mt . Note that the functional from B(t, T, Mt ) is not explicitly known, except for some very special choices of dates t and T . In some instances, it may appear convenient to postulate that26 B(T, S, MT ) = A + B(S)MT VT (MT ) and to derive further properties from the martingale feature of relative prices. In the next section, we shall present a particular example of such an approach, in which we focus on the derivation of a simple formula for the so-called convexity correction. Then, in Section 4.2, we shall discuss the problem of calibration of the Markov-functional model. 25 We present here only few examples of their approach. The interested reader is referred to the original papers

and to Hunt and Kennedy (2000) for a more detailed account. 26 See Hunt et al. (1996) for alternative kinds of the functional dependence, including exponential and geometric.

10. Modelling of Forward Libor and Swap Rates

387

4.1 Terminal swap rate model The terminal swap rate model – put forward by Hunt et al. (1996) – was primarily designed for the purpose of the comparative pricing of non-standard swap contracts vis-`a-vis plain vanilla swaps (informally, this is referred to as convexity correction; see Schmidt (1996)). Let us consider, as usual, a given collection of reset/settlement dates T0 , . . . , Tn . We assume that the market price at time 0 of the (plain vanilla) fixed-for-floating swaption is known. We postulate, in addition, that it is given by Black’s formula for swaptions. Let us consider the family of bond prices B(T, S), where the maturity date S ≥ T belongs to some set S of dates. We postulate that there exist constants A and BS such that for any S ∈ S D(T, S) := B(T, S)G −1 T (n) = A + B S κ(T, T, n), where G t (n) = nj=1 δ j B(t, T j ), and (cf. (3.8)) κ(t, T, n) =

(4.1)

B(t, T ) − B(t, Tn ) B(t, T ) − B(t, Tn ) = . δ 1 B(t, T1 ) + · · · + δ n B(t, Tn ) G t (n)

Using the martingale property of discounted bond price D(·, S) and forward swap rate κ(·, T, n) under the corresponding forward swap measure associated with the choice of G(n) as a numeraire, we get D(t, S) = A + BS κ(t, T, n), or equivalently B(t, S) = A(1 − B(t, Tn )) + BS G t (n) for every t ∈ [0, T ]. We thus see that condition (4.1) is rather stringent; it implies that the price of any bond of maturity S from S can by represented as a linear combination of values of two particular portfolios of bonds, with one coefficient independent of maturity date S. The problem of whether such an assumption can be supported by an arbitrage-free model of the term structure is not addressed in Hunt et al. (1996). Let us now focus on the derivation of values of constants A and BS . To this end, we assume that equality (4.1) holds, in particular, for any S = T j , j = 1, . . . , n. Then n n n A δj + δ j BT j κ(T, T, n) = A(Tn − T0 ) + δ j BT j κ(T, T, n) = 1, j=1

j=1

j=1

and thus A = (Tn − T0 )−1 ,

n j=1

δ j BT j = 0.

(4.2)

388

M. Rutkowski

Consequently, using the first equality above and the martingale property of D(·, S) and κ(·, T, n), we obtain −1 + BS κ(0, T, n), B(0, S)G −1 0 (n) = (Tn − T0 )

(4.3)

so that for each maturity in question the constant B S is also uniquely determined. Notice that the second equality in (4.2) is also satisfied for this choice of BS . Hunt and Kennedy (2000) argue that under (4.1) the problem of pricing irregular cashflows becomes relatively easy to handle. To illustrate this point, assume that we wish to value the claim X which settles at time T and admits the following representation: m ci B(T, Si )F, X= i=1

where the ci are constants, and Si ∈ S for i = 1, . . ., m. We assume that the FT -measurable random variable F has the form F = F˜ B(T, S1 ), . . . , B(T, Sm ) for some function F˜ : Rm + → R. To be in line with the notation introduced in Section 3.4, we denote n 1 2 Vt = B(t, T ) − B(t, Tn ), Vt = δ j B(t, T j ) = G t (n). j=1

Using (4.1) and (4.2)–(4.3), we obtain m ci A(1 − B(T, Tn )) + BSi G T (n) F = w1 VT1 F + w2 VT2 F, X= i=1

m m where w1 = i=1 ci A and w2 = i=1 ci BSi . In view of the discussion in Section 3.4, it is clear that π t (X ) = w1 Vt1 E P1 (F | Ft ) + w2 Vt2 E P2 (F | Ft ).

(4.4)

Under the assumption that the forward rate κ(·, T, n) follows a geometric Brownian motion under the forward swap measure P2 , it follows also a lognormally distributed process under P1 (see the discussion in Section 3.4). Consequently, under (4.1), the joint (conditional) probability law of random variables B(T, S1 ), . . . , B(T, Sm ) under probability measures P1 and P2 are explicitly known. We conclude that the conditional expectations in (4.4) can be, in principle, evaluated. Consider, for instance, a fixed-for-floating constant maturity swap.27 To value one leg of the floating side of a constant maturity swap, consider a cashflow proportional to κ(T, T, n), which takes place at some date M > T . Ignoring the constant, 27 Similarly as in the case of a plain vanilla fixed-for-floating swap, in a constant maturity swap the fixed and

floating payments occur at regularly spaced dates. The amounts of floating payments are based not on a Libor rate, but on some other swap rate, however.

10. Modelling of Forward Libor and Swap Rates

389

such a payoff is equivalent to the claim X = B(T, M)κ(T, T, n) which settles at time T . Using (4.4), we obtain π t (X ) = B M Vt1 E P1 (κ(T, T, n) | Ft ) + AVt2 E P2 (κ(T, T, n) | Ft ). Consequently, at time 0 we have π 0 (X ) = B M (B(0, T ) − B(0, Tn ))κ(0, T, n)eσ

2T

+ AG 0 (n)κ(0, T, n),

where σ is the implied volatility of the traded swaption with maturity date T . Using the formula for B M , we get 2 π 0 (X ) = B(0, M) − AG 0 (n) κ(0, T, n)eσ T + AG 0 (n)κ(0, T, n), or finally

2 π 0 (X ) = B(0, M)κ(0, T, n) 1 + (1 − w)eσ T ,

(4.5)

where we write w = AG 0 (n)B −1 (0, M). It should be stressed that the simple valuation result (4.5) hinges on the strong assumption (4.1).

4.2 Calibration of Markov-functional models The most important feature of Markov-functional models is the fact that their calibration to market prices of plain vanilla derivatives is relatively easy to perform. For convenience, we shall focus here on the calibration of the Markov-functional model of fixed-maturity forward swap rates. The case of forward Libor rates can be dealt with in an analogous way. A more extensive discussion of this issue can be found in Hunt et al. (2000). First, we assume that the forward swap rate for the date Tn−1 follows a lognormal martingale under the corresponding forward measure P Tn . More specifically, we postulate that the process κ(·, ˜ Tn−1 ) = κ(·, Tn−1 , 1) satisfies ˜ Tn−1 )ν(t, Tn−1 )dWt , dτ κ(t, Tn−1 ) = κ(t,

(4.6)

where W is a Brownian motion under PTn and ν(·, Tn−1 ) is a strictly positive deterministic function. If we take the process t Mt = ν(u, Tn−1 ) dWu 0

as the driving Markov process for our model, then clearly 1 Tn−1

˜ Tn−1 ) e MTn−1 − 2 κ(T ˜ n−1 , Tn−1 ) = κ(0,

0

ν 2 (u,Tn−1 ) du

(4.7)

390

M. Rutkowski

and

−1 1 Tn−1 2 B(Tn−1 , Tn , MTn−1 ) = 1 + δ n κ(0, ˜ Tn−1 ) e MTn−1 − 2 0 ν (u,Tn−1 ) du .

(4.8)

Suppose that we are given (digital) swaptions prices for all strikes κ > 0 and all expiration dates T0 , . . . , Tn−1 . Our goal is to find the joint probability law of (κ(T ˜ 0 , T0 ), . . . , κ(T ˜ n−1 , Tn−1 )) under PTn . This can be achieved by deriving the functional dependence of each rate κ(T ˜ j , T j ) on the underlying Markov process; more specifically, we search for the function h j : R+ → R+ such that κ(T ˜ j , Tj ) = h j (MT j ). To this end, we assume that for any j = 0, . . . , n − 1 there exists a strictly increasing function h j such that this holds (in view of (4.7), this statement is valid for j = n − 1). By the definition of the probability measure PTn , for i = j + 1, . . . , n B(Ti , Ti ) B(Ti , Ti ) B(T j , Ti ) = E PTn FTi = E PTn MT j B(T j , Tn ) B(Ti , Tn ) B(Ti , Tn ) since FTi = FTWi = FTMi . Therefore, if B(Ti , Tn ) = B(Ti , Tn , MTi ) we obtain 1 B(T j , Ti ) = E PTn MT j , B(T j , Tn ) B(Ti , Tn , MTi ) so that the right-hand side in the formula above is a function of MT j . Consequently, for n δ i B(T j , Ti ) G T j (n − j) = i= j+1

we get n G T j (n − j) δi = E P Tn MT j = g j (MT j ), B(T j , Tn ) B(Ti , Tn , MTi ) i= j+1

(4.9)

where g j : R → R is a measurable function with strictly positive values. The right-hand side in (4.9) can be evaluated using the transition p.d.f. p M (t, m; u, x) of the Markov process M, provided that the functional form of B(Ti , Tn , MTi ) is known for every i = j + 1, . . . , n. To put it more explicitly, n δ i p M (T j , m; Ti , x) g j (m) = d x. (4.10) B(Ti , Tn , x) i= j+1 R We work back iteratively from the last relevant date Tn−1 . In the first step, i.e., when j = n − 2, the functional form of B(Tn−1 , Tn , MTn−1 ) is given by (4.8). Assume now that the functional forms of B(Ti , Tn , MTi ) were already found for

10. Modelling of Forward Libor and Swap Rates

391

i = j + 1, . . . , n − 1. In order to determine B(T j , Tn , MT j ), it is enough to find the functional form of the swap rate κ(T ˜ j , T j ). Indeed, we have κ(T ˜ j , Tj ) =

1 − B(T j , Tn ) G T j (n − j)

and thus ˜ j , Tj ) B −1 (T j , Tn ) = 1 + κ(T

G T j (n − j) = 1 + h j (MT j )g j (MT j ). B(T j , Tn )

(4.11)

Our next goal is to show how to find the function h j , under the assumption that the functional forms of bonds prices B(Ti , Tn , MTi ) are known for every i = j + 1, . . . , n. To this end, we assume that we are given all market prices of digital swaptions with expiration date T j and any strictly positive strike level κ. We find it convenient to represent the price at time 0 of the j th digital swaption, with strike κ and expiration date T j , in the following way:28 G T j (n − j) j 11 {κ(T DS 0 (κ) = B(0, Tn ) E PTn ˜ j ,T j )>κ} B(T j , Tn ) for j = 0, . . . , n − 2. Under the present assumptions, we obtain j DS 0 (κ) = B(0, Tn ) E PTn g j (MT j ) 11 {h j (MT j )>κ} , or equivalently, j DS 0 (κ) = B(0, Tn ) E PTn g j (MT j ) 11 {MT

>h −1 j (κ)} j

.

Finally, if we denote by f M (x) = p M (0, 0; T j , x) the p.d.f. of MT j under PTn , then j DS 0 (κ) = B(0, Tn ) g j (x) 11 {x>h˜ j (κ)} f M (x) d x, (4.12) R

j 29 where we write hˆ j = h −1 j . It is natural to assume that the function DS 0 : R+ → R+ is strictly decreasing as a function of the strike level κ, with j DS 0 (0)

=

n

δ i B(0, Ti ) = G 0 (n − j)

i= j+1 j

and DS 0 (+∞) = 0. Since E PTn g j (MT j ) = G 0 (n − j)B −1 (0, Tn ) 28 By definition, the j th digital swaption, with unit notional principal, pays the amount δ at time T for i = i i j + 1, . . . , n whenever the inequality κ(T ˜ j , T j ) > κ holds. 29 Recall that the function DS j represents the observed market prices of digital swaptions. Therefore, the 0

foregoing assumptions about the behaviour of this function are indeed quite natural.

392

M. Rutkowski

it can be deduced from (4.12) that hˆ j (0) = −∞. On the other hand, condition j DS 0 (+∞) = 0 implies that hˆ j (+∞) = +∞. Finally, the function hˆ j implicitly defined through equality (4.12) is strictly increasing, so that it admits an inverse function h j with desired properties. To wit, for h j = hˆ −1 j we have: h j : R → R+ is strictly increasing, with h j (−∞) = 0 and h j (+∞) = +∞. This shows that the procedure above leads to a reasonable specification of the functional form κ(T ˜ j , T j ) = h j (MT j ). For the reader’s convenience, we shall recapitulate the main steps of the calibration procedure. In the first step, we numerically find the function h n−2 which expresses κ(T ˜ n−2 , Tn−2 ) in terms of MTn−2 . To this end, we need first to evaluate the function gn−2 using formula (4.10) with B(Tn , Tn , x) = 1 and B(Tn−1 , Tn , x) given by (4.8). In the second step, we first determine B(Tn−2 , Tn , x) using relationship (4.11), that is, B −1 (Tn−2 , Tn , x) = 1 + h n−2 (x)gn−2 (x). Then, we find gn−3 using (4.10), and subsequently we determine the rate κ(T ˜ n−3 , Tn−3 ), or rather the corresponding function h n−3 . Continuing this procedure, we end up with the following representation of the finite family of swap rates: (κ(T ˜ 0 , T0 ), . . . , κ(T ˜ n−1 , Tn−1 ) = g0 (MT0 ), . . . , gn−1 (MTn−1 ) . This representation uniquely specifies the probability law of the considered family of swap rates under the terminal forward measure PTn . Remarks In view of (4.6), the price at time t ≤ Tn−1 of the (n −1)th digital swaption equals (κ) = δ n B(t, Tn ) PTn {κ(T ˜ n−1 , Tn−1 ) > κ | Ft }, DS n−1 t that is,

DS n−1 (κ) = δ n B(t, Tn )N h˜ 2 (t, Tn−1 ) , t

(4.13)

where N denotes the standard Gaussian cumulative distribution function, and the coefficient h˜ 2 is given in the formulation of Proposition 3.5. Needless to say that formula (4.13) is not valid in the present setup, even for t = 0, for any digital swaption with maturity T0 , . . . , Tn−2 . Moreover, it is clear that assumption (4.6) is not necessary; we need only assume that the functional form of the swap rate κ(T ˜ n−1 , Tn−1 ) with respect to some underlying Markov process M is explicitly known (and is a monotone function of MTn−1 ).

10. Modelling of Forward Libor and Swap Rates

393

References Andersen, L. (2000), A simple approach to the pricing of Bermudan swaptions in the multifactor LIBOR market model, Journal of Computational Finance 3(2), 5–32. Andersen, L. and Andreasen, J. (1997), Volatility skews and extensions of the Libor market model, working paper, National Australia Bank and University of New South Wales. Brace, A. (1996), Dual swap and swaption formulae in the normal and lognormal models, working paper, University of New South Wales. Brace, A., Ga¸ tarek, D. and Musiela, M. (1997), The market model of interest rate dynamics, Mathematical Finance 7, 127–54. Brace, A., Musiela, M. and Schl¨ogl, E. (1998), A simulation algorithm based on measure relationships in the lognormal market model, working paper, University of New South Wales. Brace, A. and Womersley, R.S. (2000), Exact fit to the swaption volatility matrix using semidefinite programming, working paper, National Australia Bank and University of New South Wales. B¨uhler, W. and K¨asler, J. (1989), Konsistente Anleihenpreise und Optionen auf Anleihen, working paper, University of Dortmund. Cox, J. and Ross, S. (1976), The valuation of options for alternative stochastic processes, Journal of Financial Economics 3, 145–66. D¨oberlein, F. and Schweizer, M. (1998), On term structure models generated by semimartingales, working paper, Technische Universit¨at Berlin. D¨oberlein, F., Schweizer, M. and Stricker, C. (2000), Implied savings accounts are unique, Finance and Stochastics 4, 431–42. Dun, T., Schl¨ogl, E. and Barton, G. (2000), Simulated swaption delta-hedging in the lognormal forward LIBOR model, working paper, University of Sydney and University of Technology, Sydney. Flesaker, B. (1993), Arbitrage free pricing of interest rate futures and forward contracts, Journal of Futures Markets 13, 77–91. Flesaker, B. and Hughston, L. (1996a), Positive interest, Risk 9(1), 46–9. Flesaker, B. and Hughston, L. (1996b), Positive interest: foreign exchange, in: Vasicek and Beyond, L. Hughston, ed., Risk Publications, London, pp. 351–67. Flesaker, B. and Hughston, L. (1997), Dynamic models of yield curve evolution, in: Mathematics of Derivative Securities, M.A.H. Dempster and S.R. Pliska, eds., Cambridge University Press, Cambridge, pp. 294–314. Geman, H., El Karoui, N. and Rochet, J.C. (1995), Changes of numeraire, changes of probability measures and pricing of options, Journal of Applied Probability 32, 443–58. Glasserman, P. and Kou, S.G. (1999), The term structure of simple forward rates with jump risk, working paper, Columbia University. Glasserman, P. and Zhao, X. (1999), Fast greeks by simulation in forward LIBOR models, Journal of Computational Finance 3(1), 5–39. Glasserman, P. and Zhao, X. (2000), Arbitrage-free discretization of lognormal forward Libor and swap rate model, Finance and Stochastics 4, 35–68. Goldys, B. (1997), A note on pricing interest rate derivatives when Libor rates are lognormal, Finance and Stochastics 1, 345–52. Goldys, B., Musiela, M. and Sondermann, D. (1994), Lognormality of rates and term structure models, working paper, University of New South Wales. Heath, D., Jarrow, R. and Morton, A. (1992), Bond pricing and the term structure of

394

M. Rutkowski

interest rates: a new methodology for contingent claim valuation, Econometrica 60, 77–105. Hull, J.C. and White, A. (1999), Forward rate volatilities, swap rate volatilities, and the implementation of the LIBOR market model, working paper, University of Toronto. Hunt, P.J. and Kennedy, J.E. (1997), On convexity corrections, working paper, ABN-Amro Bank and University of Warwick. Hunt, P.J. and Kennedy, J.E. (1998), Implied interest rate pricing model, Finance and Stochastics 2, 275–93. Hunt, P.J. and Kennedy, J.E. (2000) Financial Derivatives in Theory and Practice, John Wiley & Sons, Chichester. Hunt, P.J., Kennedy, J.E. and Pelsser, A. (2000), Markov-functional interest rate models, Finance and Stochastics 4, 391–408. Hunt, P.J., Kennedy, J.E. and Scott, E.M. (1996), Terminal swap-rate models, working paper, ABN-Amro Bank and University of Warwick. Jamshidian, F. (1996), Pricing and hedging European swaptions with deterministic (lognormal) forward swap rate volatility, working paper, Sakura Global Capital. Jamshidian, F. (1997), Libor and swap market models and measures, Finance and Stochastics 1, 293–330. Jamshidian, F. (1999), Libor market model with semimartingales, working paper, NetAnalytic Limited. Jin, Y. and Glasserman, P. (1997), Equilibrium positive interest rates: a unified view, forthcoming in Review of Financial Stuidies. Lotz, C. and Schl¨ogl, L. (2000), Default risk in a market model, Journal of Banking and Finance 24, 301–27. Miltersen, K., Sandmann, K. and Sondermann, D. (1997), Closed form solutions for term structure derivatives with log-normal interest rates, Journal of Finance 52, 409–30. Musiela, M. (1994), Nominal annual rates and lognormal volatility structure, working paper, University of New South Wales. Musiela, M. and Rutkowski, M. (1997a) Martingale Methods in Financial Modelling, Springer-Verlag, Berlin. Musiela, M. and Rutkowski, M. (1997b), Continuous-time term structure models: forward measure approach, Finance and Stochastics 1, 261–91. Musiela, M. and Sawa, J. (1998), Interpolation and modelling term structure, working paper, University of New South Wales. Musiela, M. and Sondermann, D. (1993), Different dynamical specifications of the term structure of initial rates and their implications, working paper, University of Bonn. Neuberger, A. (1990), Pricing swap options using the forward swap market, working paper, London Business School. Rady, S. (1997), Option pricing in the presence of natural boundaries and a quadratic diffusion term, Finance and Stochastics 1, 331–44. Rady, S. and Sandmann, K. (1994), The direct approach to debt option pricing, Review of Futures Markets 13, 461–514. Rebonato, R. (1999), On the pricing implications of the joint lognormal assumption for the swaption and cap markets, Journal of Computational Finance 2(3), 57–76. Rebonato, R. (2000), On the simultaneous calibration of multifactor lognormal interest rate models to Black volatilities and to the correlation matrix, Journal of Computational Finance 2(4), 5–27. Rutkowski, M. (1997), A note on the Flesaker-Hughston model of term structure of interest rates, Applied Mathematical Finance 4, 151–63. Rutkowski, M. (1998), Dynamics of spot, forward, and futures Libor rates, International

10. Modelling of Forward Libor and Swap Rates

395

Journal of Theoretical and Applied Finance 1, 425–45. Rutkowski, M. (1999), Models of forward Libor and swap rates, Applied Mathematical Finance 6, 29–60. Sandmann, K. and Sondermann, D. (1993), On the stability of lognormal interest rate models, working paper, University of Bonn. Sandmann, K. and Sondermann, D. (1997), A note on the stability of lognormal interest rate models and the pricing of Eurodollar futures, Mathematical Finance 7, 119–25. Sandmann, K., Sondermann, D. and Miltersen, K.R. (1995), Closed form term structure derivatives in a Heath–Jarrow–Morton model with log-normal annually compounded interest rates, in: Proceedings of the Seventh Annual European Futures Research Symposium Bonn, 1994, Chicago Board of Trade, pp. 145–65. Schl¨ogl, E. (1999), A multicurrency extension of the lognormal interest rate market model, working paper, University of Technology, Sydney. Schmidt, W.M. (1996), Pricing irregular interest cash flows, working paper, Deutsche Morgan Grenfell. Schoenmakers, J. and Coffey, B. (1999), Libor rates models, related derivatives and model calibration, working paper. Sidenius, J. (1997), Libor market models in practice, Journal of Computational Finance 3(3), 5–26. Uratani, T. and Utsunomiya, M. (1999), Lattice calculation for forward LIBOR model, working paper, Hosei University. Yasuoka, T. (1998), No arbitrage relation between a swaption and a cap/floor in the framework of Brace, Gatarek and Musiela, working paper, Fuji Research Institute Corporation. Yasuoka, T. (1999), Mathematical pseudo-completion of the BGM model, working paper, Fuji Research Institute Corporation.

Part three Risk Management and Hedging

11 Credit Risk Modelling: Intensity Based Approach Tomasz R. Bielecki and Marek Rutkowski

1 Introduction Let B(t, T ) and D(t, T ) denote prices at time t of default-free and default-risky (or defaultable) zero coupon bonds maturing at time T , respectively. The default-free bond pays $1 at time T . The (recovery) payment for the default-risky bond needs to be modelled. Two major situations are commonly considered (if the bond defaults prior to or on the maturity date then): (a) the recovery payment is received by the holder of the defaultable bond at the default time of the bond, or (b) the recovery payment is received by the holder of the defaultable bond at the maturity time of the bond. Of course, if the defaultable bond does not default prior to or on the maturity date, then it pays $1 at maturity. In this chapter we present a survey of recent research efforts aimed at pricing and hedging of default-prone debt instruments. We concentrate on intensity and ratings based approaches. In particular we review some results derived by Duffie, Schr¨oder and Skiadas (1996), Duffie and Singleton (1998a, 1999), Jarrow and Turnbull (1995, 2000), Jarrow, Lando and Turnbull (1997), Lando (1998), Madan and Unal (1998a, 1998b), Jeanblanc and Rutkowski (2000a, 2000b), Bielecki and Rutkowski (1999, 2000), and Lotz and Schl¨ogl (2000), among results obtained by other researchers. In addition we present a brief survey of some important types of credit derivatives, that is derivative products linked to either corporate or sovereign debt, and we describe how to price them within the Bielecki and Rutkowski approach. It should be emphasized that the need to rationally price and hedge credit derivatives, whose presence in financial markets has been continuously growing in the recent years, was one of the motivations, besides the need to manage credit risk, behind the explosion of research on quantitative aspects of the credit risk that has been observed in the 1990s. Let us mention here that the firm-specific approach – that is, an approach based on observations of the value of debt’s issuer – is not addressed in the present 399

400

T. R. Bielecki and M. Rutkowski

chapter. This alternative approach was initiated in the 1970s by Merton (1974), Black and Cox (1976), and Geske (1977). It was subsequently developed in various directions by several authors; to mention a few: Brennan and Schwartz (1997, 1980), Pitts and Selby (1983), Rendleman (1992), Kim et al. (1993), Nielsen et al. (1993), Leland (1994), Longstaff and Schwartz (1995), Leland and Toft (1996), Mella-Barral and Tychon (1996), Briys and de Varenne (1997), Crouhy et al. (1998, 2000), Duffie and Lando (1998), and Anderson and Sundaresan (2000). Reviewing this approach would require a separate article (see, e.g., Ammann (1999)). The list of references is not representative of all important papers and books published in this area in recent years, but it includes works that are most related to this presentation.

2 Credit derivatives Credit derivatives are privately negotiated derivatives securities that are linked to a credit-sensitive asset as the underlying asset. More specifically, the reference security of a credit derivative can be an actively-traded corporate or sovereign bond or a portfolio of these bonds. A credit derivative can also have a loan (or a portfolio of loans) as the underlying reference credit. Credit derivatives can be structured in a large variety of ways; they are typically complex agreements, customized to the precise needs of an investor. The common feature of all credit derivatives is the fact that they allow for the transference of the credit risk from one counterparty to another, so that they can be used to control the credit risk exposure. Credit risk refers to the possibility that a borrower will fail to service or repay a debt on time. The overall risk we are concerned with involves two components: market risk and asset-specific credit risk. In contrast to ‘standard’ interest-rate derivatives, credit derivatives allow us to isolate and handle not only the market risk, but also the firm-specific credit risk. They provide also a way to synthesize assets that are otherwise not available to a particular investor (in this application, an investor ‘buys’ – rather then ‘sells’ – a specific credit risk). Similarly as in the case of derivative securities associated with the risk-free term structure, we may formally distinguish three main types of agreements: forward contracts, swaps, and options. A forward contract commits the buyer to purchasing a specified bond at a specified future date at a price predetermined at contract inception. In a forward contract, the default risk is normally borne by the buyer. If a credit event occurs, the transaction is marked to market and unwound. Forward contracts can also be transacted in spread form; that is, the agreement can be based on the specified bond’s spread over a benchmark asset. It should be stressed that the classification above does not corresponds to market terminological conventions, as described below.

11. Credit Risk Models: Intensity Based Approach

401

In market practice, the most popular credit-sensitive swap contract is a total rate of return swap, explained in some detail in Section 2.1 below. Credit options are typically embedded in complex credit-sensitive agreements, though the over-thecounter traded credit options – such as default puts, also described in Section 2.1 – are also available. Let us finally mention the so-called vulnerable options, or more generally, vulnerable claims. These are contingent agreements that are issued by credit-sensitive institutions, so that they are subject to default in much the same way as defaultable bonds.

2.1 Overview of instruments We first review the most actively traded types of credit-sensitive agreements.1 It should be stressed that we do not intend to examine here all aspects of credit derivatives as a tool in the risk management. The non-exhaustive list of examples given below makes it clear that a wide range of objectives can be achieved by trading in credit derivatives. For an extensive analysis of economical reasons which support the use of these products, we refer to Das (1998a, 1998b) or Tavakoli (1998). Total rate of return swaps Total rate of return swaps (total return swaps, for short) are agreements in which the total return of an underlying credit-sensitive asset (basket of assets, index, etc.) is exchanged for some other cash flow. More specifically, one party agrees to pay the total return (income plus or minus any change in the capital value) on a notional principal amount to another party in return for periodic fixed or floatingrate payments on the same notional amount. Let us enumerate the most important features of a total return swap: (a) no principal amounts are exchanged and no physical change of ownership occurs, (b) the maturity of the total return swap agreement need not match that of the underlying, (c) at the contract termination – i.e., at the contract maturity or upon default – according to Das (1998a), ‘a price settlement based on the change in the value of the bond or loan is made’. Total return swaps can incorporate put and call options (to establish caps and floors on the returns of the reference assets), as well as caps and floors on a floating interest rates. Credit-spread swaps and options With credit-spread swaps (that is, relative performance total return swaps), also known as credit-spread forwards, investors pay the total return of one asset while receiving the total return of another credit-sensitive asset. Credit-spread options 1 Let us mention that the terminological conventions relative to credit derivatives are not yet fully standardized;

we shall try to follow the most widely accepted terminology.

402

T. R. Bielecki and M. Rutkowski

are option agreements whose payoff is associated with the yield differential of two credit-sensitive assets. For instance, the reference rate of the option can be a spread of a corporate bond over a benchmark asset of comparable maturity. The option can be settled either in cash or through physical delivery of the underlying bond, at a price whose yield spread over the benchmark asset equals the strike spread. Options on credit spreads allow one to isolate the firm-specific credit risk from the market risk. Credit (default) swaps These are agreements in which a periodic fixed payments (or upfront fee) from the protection buyer is exchanged for the promise of some specified payment from the protection seller to be made only if a particular, predetermined credit event occurs. If, during the term of the default swap, a credit event occurs, the seller pays the buyer an amount to cover the loss, and the swap then terminates. If no credit event has occurred by maturity of the swap, both sides end their obligations to each other. The most important covenants of a credit swap contract are: (a) the specification of the credit event, which is formally defined as a ‘default’ (in practice, it may include: bankruptcy, insolvency, payment default, a stipulated price decline for the reference asset, or a rating downgrade for the reference asset), (b) the contingent default payment, which may be structured in a number of ways; for instance, it may be linked to the price movement of the reference asset, or it can be set at a predetermined level (e.g., a fixed percentage of the notional amount of the transaction), (c) the specification of periodic payments which depend, in large part, on the credit quality of the reference asset. Credit swaps are usually settled in cash, but the agreement may also provide for physical delivery; for example, it may involve payment at par by the seller in exchange for the delivery of the defaulted reference asset. If the payment is triggered by the default and equals to the difference between the face value of a bond and its market price, the contract is named the default swap. Let us finally mention the so-called first-to-default swaps, which are examples of basket default swaps (i.e., default swaps linked to a portfolio of credit-sensitive securities). Credit (default) options A credit call (put, resp.) option gives the right to buy (to sell, resp.) an underlying credit-sensitive asset (index, credit spread, etc.) at a predetermined price. The most widely used type of a credit option is a default put. The buyer of the default put pays a premium (either an upfront fee or a periodic payment) to the seller who then assumes the default risk for the reference asset. If there is a credit (default) event during the term of the option, the seller pays the buyer a (fixed or variable) default payment.

11. Credit Risk Models: Intensity Based Approach

403

Credit linked notes Credit linked notes are debt instruments in which the coupon or price of the note is linked to the performance of a reference credit-sensitive asset (rate or index). For instance, a credit-linked note may stipulate that the principal repayment is reduced to a certain level below par if the external corporate or sovereign debt defaults before the maturity of the note. This means that the buyer of the note sells credit protection to the issuer of the note; in exchange the note pays a higher-than-normal yield.

2.2 Market pricing methods Since a reliable benchmark model for credit derivatives is not yet available, it is common in market practice to value a credit derivative on a stand-alone basis, using a judiciously chosen ad hoc approach, rather than a sophisticated mathematical model. We shall review the most widely used of these approaches. For explanatory purposes, we focus on the valuation of a default swap, and we base our description of the pricing methods2 on BeSaw (1997). Same-cost as reference method To estimate the price of a default swap, one assumes that there exists an insured bond which is otherwise identical to the reference bond of the swap. The spread between the yield of the insured bond and that of the reference bond can then be taken as the proxy of the default swap price. Notice that this method identifies a default swap with bond insurance, and disregards the credit difference between the bond insurer and the default swap counterparty. Credit-spread-based method This way of default swap valuation is based on a comparison of the yield of the reference bond and the yield of a risk-free bond with similar maturity. It is thus implicitly assumed that the spread over the risk-free asset is entirely due to the credit risk so that the impact of tax and/or liquidity effects are neglected. Another difficulty arises when one wishes to price a swap with maturity which does not correspond to the maturity of the reference corporate bond. Replication of cost method In this method, the price of a default swap is calculated through evaluation of the cost of a portfolio necessary to replicate the swap. The replication of cost method 2 For an exhaustive analysis of practical aspects of credit swaps and a review of non-technical methods of their

valuation (including the estimation of hazard rates), we refer to Duffie (1999).

404

T. R. Bielecki and M. Rutkowski

thus mimics the standard approach to contingent claims valuation in an arbitragefree setup. Unfortunately, it is typically not possible or too costly to establish a (static or dynamic) portfolio which fully hedges (i.e. replicates) a credit derivative. Ratings-based default method This approach, which will be analysed in more details in what follows, determines the price of a credit derivative (for instance, a default swap) as the expected loss resulting from default. To derive default probabilities, it is common to model the Markov chain representing ratings migration process using the estimated credit ratings transition matrix. If the valuation is made on a stand-alone basis, it would be more adequate to use the firm-specific transition matrix corresponding to the reference asset. It is clear that such a matrix is not easily available, however. Similarly, constant (or random) recovery rates, which are needed to evaluate the expected loss, are either inferred using the historical data, or assessed on a stand-alone basis. The credit-spread-based default method can be seen as a variant of a ratings-based default method. It uses an issuer-specific credit spread over default-free instruments of similar maturity to estimate the probability of default and the expected recovery rate in default.

3 Valuation of defaultable claims The exposition in this section is mainly based on Duffie et al. (1996). In this section, our goal is to present the most fundamental results which can be obtained using the intensity-based approach. In Section 4, special attention will be paid to the various kinds of recovery rates, such as, for instance, zero recovery, fractional recovery of par, and fractional recovery of market value. On the other hand, in order to obtain as explicit valuation formulae as possible, we shall still assume that only two states are possible, namely, non-default and default. An analysis of the case of several credit rating classes is postponed to Sections 5–7. We make the following standing assumptions. (A.1) We are given a probability space (, G, P∗ ), endowed with the filtration F = (Ft ) t∈R+ (of course, Ft ⊂ G for every t ∈ R+ ). The probability measure P∗ is interpreted as a martingale measure for our underlying securities market model (complete or not). Let τ be a non-negative random variable on the probability space (, G, P∗ ). In what follows, we shall refer to τ as the default time. For convenience, we assume that for every t ∈ R+ , P∗ {τ = 0} = 0 and P∗ {τ > t} > 0. Given a default time τ , we introduce the associated (single) jump process H by setting Ht = 11{τ ≤t} for t ∈ R+ . It is obvious that H is a right-continuous process. Let H be the filtration generated by the process H ,

11. Credit Risk Models: Intensity Based Approach

405

i.e., Ht = σ (Hu : u ≤ t). We introduce the enlarged filtration G which satisfies G = H ∨ F – that is, Gt = Ht ∨ Ft = σ (Ht , Ft ) for every t. (A.2) For a given default-risky security, its default process is modelled through a jump process H with strictly positive intensity (or hazard rate) process3 λ under P∗ . The intensity λ is an F-progressively measurable process such that the compensated process t∧τ t Mt := Ht − λu du = Ht − h u du, ∀ t ∈ [0, T ∗ ], (3.1) 0

0

follows an G-martingale under P∗ . Notice that the auxiliary G-adapted process h satisfies h t := 11{t≤τ } λt . Remarks Let us stress that the stochastic intensity λ is assumed to follow an Fadapted adapted process, and the filtration of reference F can be strictly smaller than G, in general. On the other hand, the case of an F-stopping time is also covered (in this case, F = G). (A.3) Given a maturity date T > 0, an FT -measurable random variable X represents the promised claim, that is, the amount of cash which the owner of a defaultable claim is entitled to receive at time T , provided that the default has not occurred before the maturity date T . (A.4) An F-predictable process Z models the payoff which is actually received by the owner of a defaultable claim, if default occurs before maturity T . We shall refer to Z as the recovery process of X . (A.5) An F-adapted process r stands for the short-term interest rate, and Bt := t exp( 0 ru du), t ∈ R+ , is the associated savings account process. The main result in the intensity-based approach states that a defaultable security can be priced as if it were a default-risk free security, provided that the credit spread is already incorporated in the risk premium. In other words, the risk premium process of a defaultable security differs from that associated with a risk-free bond, both in the real-world and in the risk-neutral world. In particular, in a risk-neutral world the risk premium associated with a risk-free bond vanishes, but the risk premium associated with a defaultable security is still present. 3 We refer to Artzner and Delbaen (1995), Kusuoka (1999), Rutkowski (1999), Elliott et al. (2000) or Jeanblanc

and Rutkowski (2000a, 2000b) for more details on stochastic intensities.

406

T. R. Bielecki and M. Rutkowski

Example 3.1 If the intensity process λt = λ > 0 is constant, the process H can be seen as a continuous-time Markov chain with the state space {0, 1}, and with constant intensity matrix = [λi j ] 0≤i, j≤1 , where λ00 = −λ, λ01 = λ, and λ1i = 0 for i = 0, 1 (so that the state 1 is absorbing). In this case, τ can be seen as the first jump time of a standard Poisson process N with constant intensity λ. This simple example can be generalized in two directions. First, in some circumstances it might be natural to assume that λt = λ(Yt ), where Y is a given k-dimensional F-adapted stochastic process, and λ : Rk → R+ is a positive deterministic function. Second, the basic model can be extended to accommodate for different credit rating classes, t = [λi j (Yt )] 0≤i, j≤K , with K being an absorbing state (see, e.g., Jarrow et al. (1997) or Section 6). We need first to formally define the value process S of a (European) defaultable claim, represented by a triplet (X, Z , τ ) and maturity date T . Since we assume throughout that P∗ is a spot martingale measure, it is natural to postulate that the value S0 at time 0 of a defaultable claim (X, Z , τ ) equals (3.2) S0 := B0 E P∗ Bu−1 d Du , ]0,T ]

where B stands for the savings account process, and D is the ‘dividend process’ (cf. (A.3)–(A.4)) Z u d Hu + X (1 − HT )11{t=T } . (3.3) Dt = ]0,t]

Formula (3.2) can be easily generalized to give the price of a defaultable claim at any date t, namely Bu−1 d Du Gt , St := Bt E P∗ (3.4) ]t,T ]

or equivalently, St := Bt E P∗

]t,T ]

Bu−1 Z u d Hu + BT−1 X 11{T

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close