The following books represent a core reading and reference list for anyone interested in applied probability modelling, with applications in marketing. Many of these books may at first appear rather daunting to those without a solid background in probability and statistics. Don't be scared off; with perseverance and practice, the material becomes more accessible.
General Probability and Statistics
A working knowledge of probability and statistics is necessary for anyone interested in developing and applying probability models. A classic introductory textbook (at the advanced undergraduate level) is Mood, Graybill, and Boes (1974); its coverage of the material is concise, making it an excellent reference for the basic concepts.
Ghahramani (2000) and Ross (2002) provide a good introduction to the basic concepts and tools of probability upon which applied probability models are built. Another useful book is Ross (2003) -- only pay attention to the earlier chapters, as topics such as queuing theory are not that relevant to most marketing modellers.
Two truly classic references are Feller (1968) and Stuart and Ord (1994). Many regard Feller as the bible of discrete probability. Stuart and Ord is a comprehensive reference on statistical distributions and related sampling theory.
The building blocks of any probability model are probability distributions. The five volume set of books by Johnson and Kotz (et al.) provide extremely detailed information on almost any probability distribution. These classic references cover discrete univariate distributions (Johnson, Kotz, and Kemp 1992), continuous univariate distributions (Johnson, Kotz, and Balakrishnan 1994, 1995), discrete multivariate distributions (Johnson, Kotz, and Balakrishnan 1997) and continuous multivariate distributions (Kotz, Balakrishnan, and Johnson 2000). Probability modellers soon learn their way around these books ... and wonder how they could live without them!
Wimmer and Altmann (1999) is a thesaurus that provides information on about 750 univariate discrete distributions.
Evans, Hastings, and Peacock (2000) provides a brief summary of the key properties and formulae associated with forty major probability distributions. It is certainly less daunting than the "Johnson and Kotz" books. Walck (1996) is a more detailed -- and free -- alternative reference.
The theory of stochastic processes underlies much of the applied probability modelling work within marketing. There are a number of good textbooks targeted at first-year graduate students. Four such books are Karlin and Taylor (1975), Parzen (1962), Resnick (1992), and Ross (1996).
Most basic probability models ignore any covariates. However, in many cases, the inclusion of covariate effects is central to the modelling problem.
The topic of "regression methods" for count data has received a lot of attention within the econometrics literature. Cameron and Trivedi (1998) provides a summary of the developments to-date. Winkelmann (2003) is less comprehensive in its coverage but tends to cover the material in greater detail.
Just as traditional regression methods cannot be used to analyse count data, they cannot be used to determine the effects of covariates when dealing with so-called timing data. The following four books are standard references on the analysis of timing (or duration, event-time, failure time, survival, etc.) data and all include material on how to incorporate the effects of covariates in the basic models used to analyse these data.
Finite Mixture Models
McLachlan, and Peel (2000) provides an up-to-date account of the theory and applications of modelling using finite mixture distributions. Wedel and Kamakura (2000) has an extensive coverage of finite mixture models as applied to the problem of market segmentation.
Greene (1982) is basically a short monograph on the NBD and beta-binomial models, focusing both on the properties of these models and their application to practical marketing problems.
Ehrenberg (1988) is a treatise on the NBD and Dirichlet-multinomial models. However, the focus is on the descriptive (as opposed to predictive) application of these models and their ability to capture "empirical generalizations", downplaying the mathematics of the models. (An electronic copy of this book is available as part of Volume 5 of the Journal of Empirical Generalisations in Marketing Science.)
Massy, Montgomery, and Morrison (1970) focuses on Bernoulli models, Markov models, and timing models. While much of the material is now dated, its coverage of Markov models is perhaps still the best within the marketing literature.
The basic probability models used by marketing modellers are by no-means unique to marketing. Many fields (e.g., ecology, bibliometry) use many of the same types of models -- albeit with different data.
Klugman, Panjer, and Willmot (2004) is an excellent book that focuses on applications within the insurance industry. Mastery of the material in this book would result in a solid knowledge base that could be applied to marketing problems.
Morgan (2000) focuses on the development and application of stochastic models in the field of biology. Included in this book are a number of short MATLAB programmes that implement the various models discussed in the book.
Vose (2000) is a book on quantitative risk analysis using simulation methods. It includes chapters on the basic stochastic processes that are fundamental to risk analysis modelling, and the fitting of probability distributions to available data.
When working with probability models that seek to capture multiple behavioural components (e.g., counting and choice), it is common to come across some non-standard mathematical functions (e.g., the Gaussian hypergeometric function). Abramowitz and Stegun (1972) is a classic -- and very cheap -- reference for such functions. A project is underway at the (US) National Institute of Standards and Technology to develop a replacement for this book; see http://dlmf.nist.gov/. (Note: the Wolfram Functions Site is also a great reference for anyone working with special mathematical functions.)
In the process of developing such models, it is common to be faced by some rather scary integrals. An extremely useful reference is Gradshteyn and Ryzhik (2000), as it is probably the best compilation of integrals and their solutions in existence. It is very likely that, whatever integral you are trying to solve, the answer will be in this book!
A basic familiarity with calculus is necessary to really work with probability models. Thompson and Gardner (1998) is a revised edition of Thompson's classic introduction to calculus. This very readable book serves as a primer, or as a refresher for anyone who studied calculus "years ago" and has forgotten it.
MATLAB is an excellent software package for high-performance numerical computation and visualization. As the title suggests, Pratap (2002) provides a quick introduction to MATLAB. Sigmon and Davis (2002) is a handy pocket-sized (literally) introductory reference. (Note that new editions of both books will be appearing in 2005, in light of the release MATLAB 7.) Martinez and Martinez (2002) is a useful book for anyone interested in using MATLAB to develop statistical models.