1. Terminology:
2. Multi-Factor Model
3. Multi-Factor Risk Model
4. 量化择时
5. The low risk effect
6. Factor Models and General Definition
7. Factor Selection
8. Least Squares Estimation (LSE) and Kalman Filtering (KF) for Factor Modeling: A Geometrical Perspective
9. A Regularized Kalman Filter (rgKF) for Spiky Data
10. Interview question:

1 Terminology:

ad hoc

Specific, for particular purpose.

per sec

By or in itself or themselves; intrinsically.

excess return

Return relative to the risk-free return. $ r = r_a - r_f $

expected return
active return

Return relative to a benchmark.

alpha

Expeced residual return, or expected exceptional return, or realized residual, or exceptional return.

Residual return plus benchmark timing return. $\Theta = \beta*(\text{residual return + benchmark return exceeds consensus expected return})$

2 Multi-Factor Model

2.1 What is a 'Multi-Factor Model'

A multi-factor model is a financial model that employs multiple factors in its computations to explain market phenomena and/or equilibrium asset prices. The multi-factor model can be used to explain either an individual security or a portfolio of securities.

2.2 Why a Multi-Factor Model?

Computing sample statistics directly from historical data, however, is fraught with danger.

Weak signals
noise aside,
when a new asset enters the existing universe, there is no reliable way of calculating its relationships with the other assets,
data points totalling no less than the number of assets are required to accurately estimate all the variances and covariances directly. Even with a universe of 100 assets, over 1 2 · 100 · (100+ 1) > 5; 000 relationships need to be estimated. For stock markets like the U.S. (over 12,000 assets), this becomes completely infeasible.
A better approach is to first impose some structure on the asset returns by identifying common factors within the market | that is, factors which drive asset returns.

Factors used in multi-factor models can fall into several broad categories:

Fundamental factors
- Industry and country factors reflect a company’s line of business and country of domicile.
- Style factors encapsulate the financial characteristics of an asset | a company’s size, debt levels, liquidity, etc. They are usually calculated from a mixture of market and fundamental (i.e. balance sheet) data.
- Currency factors represent the interplay between local currencies of the various assets within the model.
- Macroeconomic factors capture an asset’s sensitivity to variables such as GNP growth, bond yields, inflation, etc.
Statistical factors

are mathematical constructs responsible for the observed correlations in asset returns. They are not directly connected to any observable real-world phenomena, and may change from one period to the next.

r = Bf + u

where r is the asset returns vector at time t, f is the factor return vector, u the set of asset specific returns.

2.3 BREAKING DOWN 'Multi-Factor Model'

Multi-factor models are used to construct portfolios with certain characteristics, such as risk, or to track indexes.

2.4 Categories of Multi-Factor Models

Multi-factor models can be divided into three categories: macroeconomic models, fundamental models and statistical models.
Macroeconomic models compare a security's return to such factors as employment, inflation and interest.
Fundamental models analyze the relationship between a security's return and its underlying financials, such as earnings.
Statistical models are used to compare the returns of different securities based on the statistical performance of each security in and of itself.

2.5 Beta

The beta of a security measures the systemic risk of the security in relation to the overall market.

A beta of 1 indicates that the security theoretically experiences the same degree of volatility as the market and moves in tandem with the market.
A beta greater than 1 indicates the security is theoretically more volatile than the market.
Conversely, a beta less than 1 indicates the security is theoretically less volatile than the market.

2.6 Multi-Factor Model Formula

Ri = ai + _i(m) * Rm + _i(1) * F1 + _i(2) * F2 +…+_i(N) * FN + ei

Where:

Ri is the returns of security i
Rm is the market return
F(1, 2, 3 … N) is each of the factors used
_ is the beta with respect to each factor including the market (m)
e is the error term
a is the intercept

2.7 Fama and French Three-Factor Model

2.7.1 size of firms, SMB (small minus big)

SMB accounts for publicly traded companies with small market caps that generate higher returns

2.7.2 book-to-market values, HML (high minus low)

HML accounts for values stocks with high book-to-market ratios that generate higher returns in comparison to the market.
HML accounts for the spread in returns between value and growth stocks and argues that companies with high book-to-market ratios, also known as value stocks, outperform those with lower book-to-market values, known as growth stocks.

2.7.3 excess return on the market, portfolio's return less the risk free rate of return

2.8 Fundamental Factor Model

Style Factors	Description
负债率因子暴露（mark-cap）	Total debt to market capitalization
价值因子暴露（mark-cap）	Book-to-price
短期动量因子暴露（mark-cap）	Cumulative return over past month
成长因子暴露（mark-cap）_L	Plowback times return-on-equity
规模因子暴露（mark-cap）	Natural logarithm of market capitalization
中期动量因子暴露（mark-cap）	Cumulative return over past year excluding most recent month
换手率因子暴露（mark-cap）	1 month average daily volume over market capitalization
波动率因子暴露（mark-cap）	3 month average of absolute return over cross-sectional standard deviation
Industry Factors (32)	GICS-based industry classification with 0/1 assignments.(see the additional AX-CN Further Results Appendix)

2.9 Statistical Factor Model

Factor Structure	15 statistical factors
Estimation	2-Pass Asymptotic Principal Components factor analysis with residual variance adjusted returns.

2.10 多因子选股

打分法较常见

根据各个因子的大小对股票进行打分,然后按照一定的权重加权得到一个总分,再根据总分对股票进行筛选.

打分法优点是相对比较稳健,不容易受到极端值的影响.

回归法

用过去的股票的收益率对多因子进行回归,得到一个回归方程,然后把最新的因子代入回归方程得到一个对未来股票收益的预判,最后以此为依据进行选股.

回归法的优点是能够比较及时地调整股票对各因子的敏感性,而且不同的股票对不同的因子的敏感性也可以不同. 回归法的缺点是容易受到极端值的影响(OLS),在股票对因子敏感度变化较大的市场情况下效果也比较差.

2.10.1 策略模型

候选因子的选取
选股因子的有效性检验

第一个月初计算市场每只正常交易股票的该因子的大小,按从小到大的顺序对样本股票进行排序,并平均分为n个组合,一直到月末,在下月初再按同样的方法重新构建n个组合.
有效但冗余因子的剔除
综合评分模型的建立
模型的评价

3 Multi-Factor Risk Model

3.1 why multi factor risk model and the usage

The asset covariance matrix is critical both for portfolio construction and for risk management purposes. A key challenge in estimating the asset covariance matrix lies in the sheer dimensionality of the problem. For instance, an active portfolio containing 2000 stocks multiplies 1000 days requires more than two million independent elements. If the asset covariance matrix is computed naively — that is, by brute force — then the matrix is likely to be extremely ill-conditioned. This makes the asset covariance matrix highly susceptible to noise and spurious relationships that are unlikely to persist out-of-sample. For instance, if the number of time observations is less than the number of stocks (as would be typical for large portfolios), the matrix is said to be “rank deficient,” meaning that it is possible to construct apparently riskless portfolios.

Factor risk models were developed to provide a more robust solution to this problem. Stock returns are attributed to a factor component that affects all stocks, and an idiosyncratic component that is unique to the particular stock.

Consider a portfolio with weights $w_n$ , and return given by \[R_P = \displaystyle\sum_{n} w_n r_n.\]

The portfolio factor exposures are given by the weighted average of the asset exposures, i.e., \[X_k^P = \displaystyle\sum_{n}w_n X_{nk}\].

3.2 structure

3.2.1 four components:

a stock's exposures to the factors
its excess returns
the attributed factor returns
the specific returns.

\[r_n(t) = \displaystyle\sum_{k} X_{n,k}f_k(t)+u_n(t)\], where

$X_{n,k}$ = exposure of asset n to factor k.

$r_n(t)$ = the excess return(return above the risk free return) on stock n during the period from t to time t+1.

$f_k(t)$ = the factor return to factor k during the period from time t to time t + 1,

$u_n(t)$ = stock n's specific return during the period from time t to time t + 1. This is the return that cannot be explained by the factors.

3.2.2 risk structure:

\[ V_{n,m} = \displaystyle\sum_{k1,k2=1}^{K} X_{n,k1} F_{k1,k2} X_{m,k2} + \delta_{n,m} \] where

$V_{n,m}$ = the covariance of asset n with asset m (if n = m, this gives the variance of asset n),

$X_{n,k1}$ = the exposure of asset n to factor k1,

$F_{k1,k2}$ = the covariance of factor k l with factor k2 (if k l = k2, this gives the variance of factor kl), and

$\delta_{n,m}$ = the specific covariance of asset n with asset m. By assumption, all specific risk correlations are zero, so this term is zero unless n = m. In that case, this term gives the specific variance of asset n.

3.3 building the model

3.3.1 choosing the factors

all factors must be a priori factor. even though the factor returns are uncertain, the factor exposures must be known at the beginning of the period.

The first source, due to factors, represents the systematic component.
The second source represents the diversifiable component that cannot be explained by the factors, and is therefore deemed idiosyncratic or asset specific.

Responses to external influences
These factors include
- responses to return in the bond market (sometimes called bond beta)
- unexpected changes in inflation (iation surprise)
- changes in oil prices
- changes in exchange rates
- changes in industrial production
1. defects:
  - the response coefficient has to be estimated through regression analysis or some similar technique. This requirement leads to errors in the estimates, commonly called the error-in-variables problem.
  - the estimates are based on behavior during a past period, generally five years. Even if these past estimates are accurate in the statistical sense of capturing the true situation in the past, they may not accurately describe the present. In short, these response coefficients can be nonstationary.
  - A key assumption underlying factor risk models is that the factors capture all systematic drivers of asset returns, thus implying that the specific returns are mutually uncorrelated.
  - Another important characteristic of a high-quality factor structure is parsimony, meaning that the systematic component of asset returns is explained with the fewest possible number of factors.
2. key of success choosing factors
  - statistical significance of the factor returns. In particular, the statistical significance of the factors should be persistent across time, and not due to isolated events that are unlikely to recur in the future.
  - Stability is another characteristic of a high-quality factor structure. Stability means that typical stock exposures do not change drastically over short periods of time.
3. data preparation
  multi-step algorithm to identify and treat outliers. The algorithm assigns each observation into one of three groups.
  - The first group represents values so extreme that they are treated as potential data errors and removed from the estimation process.
  - The second group represents values that are regarded as legitimate, but nonetheless so large that their impact on the model must be limited. We trim these observations to three standard deviations from the mean.
  - The third group of observations, forming the bulk of the distribution, consists of values that are less than three standard deviations from the mean; these observations are left unadjusted.
cross-sectional comparisons of asset attributes

These factors, which have no link to the remainder of the economy, compare attributes of the stocks.
1. fundamental
  - dividend yield
  - earnings yield
  - analysts' forecasts of future earnings per share
2. market
  - volatility
  - return during a past period
  - option-implied volatility
  - share turnover
purely internal or statistical factors
It is possible to amass returns data on a large number of stocks, turn the crank of a statistical meat grinder, and admire the factors the machine produces: factor ex machina. methods:
- principal component analysis
- maximum likelihood analysis
- expectations maximization analysis
overall criteria:
- incisive
- intuitive
- interesting

3.3.2 estimating factor returns

exposures
1. industry exposures
  
  The market itself has unit exposure in total to the industries. Because large corporations can do business in several industries, the industry factors must account for multiple industry memberships.
2. risk index exposures
  - Volatilitydistinguishes stocks by their volatility. Assets that rank high in this dimension have been and are expected to be more volatile than average.
  - Momentum distinguishes stocks by recent performance.
  - Size distinguishes large stocks from small stocks.
  - Liquidity distinguishes stocks by how often their shares trade.
  - Growth distinguishes stocks by past and anticipated earnings growth.Value distinguishes stocks by their fundamentals, including ratios of earnings, dividends, cash flows, book value, and sales to price; is the stock cheap or expensive relative to fundamentals?
  - Earnings volatility distinguishes stocks by their earnings stability.
  - Financial leverage distinguishes firms by their debt-to-equity ratios and exposure to interest rate risk.
  1. Relying on several different descriptors can improve model robustness.
  2. all raw exposure data must be rescaled:
    
    \[X_{normalized} = \frac{X_{raw} - mean(X_{raw})}{SD(X_{raw})}\]
returns
- Given exposures to the industry and risk index factors, the next step is to estimate returns via multiple regressions.
- The R2 statistic, which measures the explanatory power of the model, tends to average between 30 percent and 40 percent for models of monthly equity returns with roughly 1,000 assets and 50 factors. Larger R' statistics tend to occur in months with larger market moves.
- formly: least squares regressions, weighting each observed return by the inverse of its specific variance.
- Although these cross-sectional regressions can involve many variables (the USE2 model uses 68 factors), the models do not suffer from multicollinearity. Most of the factors are industries (55out of 68 in USEZ), which are orthogonal. In addition, tests of variance inflation factors, which measure the inflation in estimate errors attributable to multicollinearity, lie far below serious danger levels.
Factor Portfolios

\[f=(X^T W X)^{-1} (X^T W r)\], where X is the exposure matrix, W is the diagonal matrix of regression weights, and r is the vector of excess returns. \[f_k = \displaystyle\sum_{n=1}^{N} C_{k,n}r_n\]
1. Factor Covariance
  
  Once the factor returns each period are estimated, we can estimate a factor covariance matrix-an estimate of all the factor variances and covariances.
2. Specific-Risk Matrixes
  
  $u_n$, is that component of its return that the model cannot explain. So the multiple-factor model can provide no insight into stock-specific returns.
  
  For specific risk, we need to model specific return variance, $u_n^2$ assuming that mean specific return is zero.
  
  In general, the model for specific risk is \[ u_n^2(t) = S(t)[1+v_n(t)]\] with \[(1/N)\displaystyle\sum_{n=1}^{N} u_n^2(t) = S(t), (1/N)\displaystyle\sum_{n=1}^{N} v_n(t) = 0 \] S(t) measures the average specific variance across the universe of stocks, and $v_n$ captures the cross-sectional variation in specific variance.

3.3.3 forecasting risk

we use a time series model for S(t) and a linear multiple-factor model for vn(t). Models for vn(t) typically include some risk index factors, plus factors measuring recent squared specific returns.

3.3.4 Data Requirements

stock returns

adjusted stock close price.
sufficient industry identification data to calculate factor exposures
- earnings
- sales
- assets segmented by industry
- historical returns
- associated option information
- fundamental accounting data
- earnings forecasts

3.4 Model validity

three categories: in-sample tests, out-of-sample tests, and empirical observations.

3.4.1 in-sample tests

$R^2$ roughly 50 factors to explain the returns to roughly 1,000 assets each month.
Monthly R2statistics for the models average about 30-40 percent, meaning that the model "explains," on average, about 30-40 percent of the observed cross-sectional variance of the universe of stock returns.
- the R2 statistic can vary quite significantly from month to month, depending in part on the overall market return.
ModelR2statistics are highest when the market return differs very signdicantly from zero. The R2statistic was very high in October 1987 because the market return was so extreme. In months when the market return is near zero, the R2 statistic can be quite low, even if discrepancies between realized and modeled returns are small
root mean square error from the regression.

3.4.2 out-of-sample tests

Out-of-sample tests compare forecast risk with realized risk.

One out-ofsample test builds portfolios of randomly chosen assets and then compares the forecast and realized active risk of those portfolios.

3.4.3 empirical observations

concerning model validity are more vague than statistical tests, but they are still relevant.

3.4.4 factor multi-collinearity多重共线性：

解释：

多重共线性是指线性回归模型中的解释变量之间由于存在精确相关关系或高度相关关系而使模型估计失真或难以估计准确。
解决办法

（1）排除引起共线性的变量,PCA降维。找出引起多重共线性的解释变量，将它排除出去，以逐步回归法得到最广泛的应用。（2）差分法EMB 时间序列数据、线性模型：将原模型变换为差分模型。（3）减小参数估计量的方差：岭回归法（Ridge Regression）。

3.5 Applications of Multiple-Factor Risk Model

the attribution of asset returns to chosen common factor and specific returns,
forecasts of the variances and covariances of these common factor and specific returns.

3.5.1 The Present: Current Portfolio Risk Analysis.

3.5.2 Portfolio Construction

risk-adjusted expected return: \[ U = \displaystyle\sum_{n=1}^{N}h_n r_n - \lambda \displaystyle\sum_{n,m=1}^{N} h_n V_{n,m} h_m\] $h_n$ is the holding of asset n, $r_n$ is the expected return to asset n, $\lambda$ is a risk aversion parameter. $V_{n,m}$ covariance coming from multiple-factor risk model.

3.5.3 Performance Analysis

3.6 Factor Exposures

3.7 Factor Returns

3.8 Factor Covariance Matrix

3.9 Specific Risk

4 量化择时

4.1 技术指标

4.2 投资者情绪

4.2.1 投资信心

4.2.2 折溢价率指标

转债转股溢价率, 封闭型基金的折价

4.2.3 新股指标

IPO的数量, IPO首日涨幅和市场定价一般受市场情绪的影响.

如果市场情绪高昂, 涨幅较大, PE高; 如果市场情绪低迷, 上市有可能跛破发行价, 市场参与者比较少, 公司上市意愿不强, IPO数量减少, 同时政策也将会停止新股审批.

4.2.4 市场指标

资金流量,创新高,新低股票,涨跌停股票数量,上涨家数百分比.

4.2.5 investor behavior

新增开户数,基金仓位,卖空比例,保证金交易,资金出入.

4.3 择时策略

4.4 SVM

传统统计模式的缺点:

传统的统计模式识别方法,只有在样本趋向无穷大时,其性能才有理论的保证.
传统的统计模式识别方法, 包括BP神经网络等,在进行机器学习时,强调经验风险最小化, 而单纯的经验风险最小化会产生学习问题,其推广能力较差.
传统的统计模式识别方法存在局部极小值的问题.

训练参数输入阶段的任务主要是确定SVM模型的参数$gamma$ 和 $sigma$.

5 The low risk effect

5.1 What is the Low Risk Effect?

beta and volatility are related, though not the same. Beta depends on volatility and correlation to the market, whereas volatility is related to idiosyncratic risk (see here for an explanation of how to calculate the different measures).

5.2 Why Does the Low Risk Effect Exist?

The first is that it’s related to one of the assumptions of the CAPM, which is that there are no constraints on either leverage or short-selling. However, in the real world, many investors are either constrained against the use of leverage (by their charters) or have an aversion to its use. The same is true of short-selling, and the borrowing costs for hard-to-borrow stocks can be quite high. Such limits can prevent arbitrageurs from correcting mispricings.

Another assumption of the CAPM is that markets have no frictions, meaning there are neither transaction costs nor taxes. And, of course, that isn’t true in the real world. And the historical evidence shows that the most mispriced stocks are the ones with the highest costs of shorting.

The explanation for the low-volatility/low-beta anomaly, then, is that, faced with constraints and frictions, investors looking to increase their return choose to tilt their portfolios toward high-beta securities to garner more of the equity risk premium. This extra demand for high-volatility/high-beta securities, and reduced demand for low-volatility/low-beta securities, may explain the flat (or even inverted) relationship between risk and expected return relative to the predictions of the CAPM model.

Another explanation long posited in the literature is that constraints on short-selling can cause stocks to be overpriced — in a market with little or no short selling, the demand for a particular security comes from the minority who hold the most optimistic expectations about it. This phenomenon is also referred to as the winner’s curse. Divergence of opinion is likely to increase with risk — high-risk stocks are more likely to be overpriced than low-risk stocks because their owners have the greatest bias.

Another explanation for the low-risk premium comes from the fact that while one of the assumptions under the CAPM is that investors are risk-averse, we know that in the real world there are investors with a “taste,” or preference, for lottery-like investments — investments that exhibit positive skewness and excess kurtosis (example of this research is here). This leads them to “irrationally” (from an economist’s perspective) invest in high-volatility stocks (which have lottery-like distributions) despite their poor returns. In other words, they pay a premium to gamble. Among the stocks that fall into the “lottery ticket” category are IPOs, small-cap growth stocks that are not profitable, penny stocks and stocks in bankruptcy. Again, limits to arbitrage and the costs and fear of shorting prevent rational investors from correcting the mispricings.

5.3 The Latest Research on the Low Risk Effect

The latest contribution to the literature on the low-risk phenomenon is from Cliff Asness, Andrea Frazzini, Niels Joachim Gormsen and Lasse Heje Pedersen, authors of the January 2017 study “Betting Against Correlation: Testing Theories of the Low-Risk Effect.” They suggest that if the low-risk effect is driven by leverage constraints, risk should be measured using systematic risk (beta). On the other hand, if the low-risk effect is driven by behavioral effects, then risk should be measured using idiosyncratic risk (volatility) — stocks with low idiosyncratic risk outperform stocks with high idiosyncratic risk.

The authors noted that “the challenge with the existing literature is that it seeks to run a horse race between factors that are, by construction, highly correlated since risky stocks are usually risky in many ways… Hence, the most powerful way to credibly distinguish these theories is to construct a new factor that captures one theory while at the same being relatively unrelated to factors capturing the alternative theory. To accomplish this, we decompose BAB into two factors: betting against correlation (BAC) and betting against volatility (BAV). BAC goes long stocks that have low correlation to the market and shorts those with high correlation, while seeking to match the volatility of the stocks that are bought and sold.” Note that stocks with low correlation to the market are likely to have low betas. They write: “Likewise, BAV goes long and short based on volatility, while seeking to match correlation. This decomposition of BAB creates a component that is relatively unrelated to the behavioral factors (BAC) and a closely related component (BAV).”

6 Factor Models and General Definition¹

6.1 Introduction

The expected excess return (or risk premium) of asset i is the return we can expect from asset i in excess of rf, $expected excess return = expected return - r_f$

To select uncorrelated factors, empirical dimension reduction techniques, such as factor analysis (FA) or principal component analysis (PCA), are performed on the covariance matrix of the returns of the risky assets.

6.2 What are factor models?

6.2.1 Notations

6.2.2 Factor representation

6.3 Why factor models in finance?

6.3.1 Style analysis

6.3.2 Optimal portfolio allocation

6.4 How to build factor models?

6.4.1 Factor selection

6.4.2 Parameters estimation

6.5 Historical perspective

One factor model: associated with the Capital Asset Pricing Model (CAPM).

6.5.1 CAPM and Sharpe’s market model

measure systematic risk, it depends only upon exposure to the overall market, usually proxied by a broad stock market index, such as the S&P 500. This exposure is measured by the CAPM beta.
capital asset pricing model (CAPM) is a model used to determine a theoretically appropriate required rate of return of an asset, to make decisions about adding assets to a well-diversified portfolio.
Only by exposing a well-diversified portfolio to higher market risk can an investor expect to achieve a higher rate of return.(SML) What is the appropriate measure of risk?
Beta, the coefficient of the independent variable (the market's rate of return) in an ordinary least squares regression equation to explain the dependent variable (a security's rate of return), measures a security's relative amount of systematic (market) risk.

The return of any stock can be separated into a systematic (market) component and a residual component. No theory is required to do this.
The CAPM says that the residual component of return has an expected value of zero.
The CAPM is easy to understand and relatively easy to implement.
There is a powerful logic of market efficiency behind the CAPM.
The CAPM thrusts the burden of proof onto the active manager. It suggests that passive management is a lowerrisk alternative to active management.
The CAPM provides a valuable source of consensus expectations. The active manager can succeed to the extent that his or her forecasts are superior to the CAPM consensus forecasts.
The CAPM is about expected returns, not risk.

overview
- The model takes into account the asset's sensitivity to non-diversifiable risk (also known as systematic risk or market risk), often represented by the quantity beta (β) in the financial industry, as well as the expected return of the market and the expected return of a theoretical risk-free asset.
- $r_a = r_f + \beta (r_m - r_f)$ rm(t) is the return (in time period t) on a market index, such as the
S&P500. β = cov[r,(t), rm(t)]/var[rm(t)
- The CAPM is equivalent to the statement that the market index is itself mean-variance efficient in the sense of providing maximum average return for a given level of volatility.
breaking down
- two factors: time value of money and risk.
- individual risk premium equals the market premium times beta.
forecast of expected return
1. problem of using historical average returns
  One alternative is to use historical average returns, i.e., the average return to the stock over some previous period. This is not a good idea, for two main reasons.
  - First, the historical returns contain a large amount of sample error.
  - Second, the universe of stocks changes over time: New stocks become available, and old stocks expire or merge. The stocks themselves change over time: Earnings change, capital structure may change, and the volatility of the stock may change. Historical averages are a poor alternative to consensus forecasts.

6.5.2 APT for arbitrage pricing theory

overview
- only one type of nondiversifiable risk influences expected security returns, and that single type of risk is market risk.
- the APT recognizes that several different broad risk sources may combine to influence security returns. The intuitive appeal of the APT results from its recognition that the interaction of several macroeconomic factors such as inflation, interest rates, and business activity affects rates of return.
- A series of beta coefficients are estimated to measure the sensitivity to the respective factor risks for a particular security.
shortfalls or using accounting attributes as market risk:
- Most are based on accountingdata, and such data are generated by rules that may differ sigmficantly across firms.
- Even if allfirms used the same accountingrules, reporting dates differ, so constructing time-synchronized interfirm comparisons is difficult.
- Most importantly, no rigorous theory tells us how traditional accounting variables should be related to an appropriate measure of risk for computing the risk-return trade-off.
Basics:
- Every stock and portfolio has exposures (or betas) with respect to each of these systematic risks. The pattern of economic betas for a stock or portfolio is called its risk exposure profile.
1. APT Equations
  
  The APT follows from two basic postulates:
  1. Postulate 1.
    In every time period, the difference between the actual (realized)return and the expected return for any asset is equal to the sum, over all risk factors, of the risk exposure (the beta for that risk factor) multiplied by the realization (the actual end-of-period value) for that risk factor, plus an asset-specific (idiosyncratic) error term. \[r_i(t) - E[r_i(t)] = \beta_i1 f_1(t) + ... + \beta_iK f_K(t) + \epsilon_i(t)\]
    - It is assumed that the expectations, at the beginning of the period, for all of the factor realizations and for the asset-specific shock are zero;
    \[E[f_1(t)] = ... = E[f_K(t)] = E[\epsilon_i(t)] = 0\]
  2. Postulate 2.
    
    Pure arbitrage profits are impossible. \[E[r_i(t)] = P_0 + \beta_i1 P_1 + ... + \beta_iK P_K\] \[r_i(t) - P_0 = \beta_i1 [P_1 + f_1(t)] + ... + \beta_iK[P_K + f_K(t)] + \epsilon_i(t)\]

6.6 Glossary Volatility

7 Factor Selection

7.1 量化选股基本面选股

7.1.1 多因子模型

优点是可以综合很多信息后给出一个选股结果. 综合因子的方法有打分法和回归法两种，打分法较常见。

steps:
1. 备选因子选取
  1. 估值：账面市值比（B/M)、盈利收益率（EPS）、动态市盈（PEG）
  2. 成长性：ROE、ROA、主营毛利率（GP/R)、净利率(P/R)
  3. 资本结构：资产负债（L/A)、固定资产比例（FAP）、流通市值（CMV）
2. 因子有效性检验
  
  采用排序的方法检验备选因子的有效性。

7.1.2 风格轮动模型

7.1.3 行业轮动模型

7.2 市场行为选股

7.2.1 资金流选股

7.2.2 动量反转模型

7.2.3 一致预期模型

7.2.4 趋势追踪模型

7.3 Introduction

7.4 Qualitative know-how

7.4.1 Fama and French model

7.4.2 The Chen et al. model

7.4.3 The risk-based factor model of Fung and Hsieh

7.5 Quantitative methods based on eigenfactors

7.5.1 Notation

7.5.2 Subspace methods: the Principal Component Analysis

7.6 Model order choice

7.6.1 Information criteria

7.7 Appendix 1: Covariance matrix estimation

7.7.1 Sample mean

7.7.2 Sample covariance matrix

7.7.3 Robust covariance matrix estimation: M-estimators

7.8 Appendix 2: Similarity of the eigenfactor selection with the MUSIC algorithm

7.9 Appendix 3: Large panel data

7.9.1 Large panel data criteria

7.10 *highlights

8 Least Squares Estimation (LSE) and Kalman Filtering (KF) for Factor Modeling: A Geometrical Perspective

8.1 Introduction

8.2 Why LSE and KF in factor modeling?

8.2.1 Factor model per return

8.2.2 Alpha and beta estimation per return

8.3 LSE setup

8.3.1 Current observation window and block processing

8.3.2 LSE regression

8.4 LSE objective and criterion

8.5 How LSE is working (for LSE users and programmers)

8.6 Interpretation of the LSE solution

8.6.1 Bias and variance

8.6.2 Geometrical interpretation of LSE

8.7 Derivations of LSE solution

8.8 Why KF and which setup?

8.8.1 LSE method does not provide a recursive estimate

8.8.2 The state space model and its recursive component

8.8.3 Parsimony and orthogonality assumptions

8.9 What are the main properties of the KF model?

8.9.1 Self-aggregation feature

8.9.2 Markovian property

8.9.3 Innovation property

8.10 What is the objective of KF?

8.11 How does the KF work (for users and programmers)?

8.11.1 Algorithm summary

8.11.2 Initialization of the KF recursive equations

8.12 Interpretation of the KF updates

8.12.1 Prediction filtering, equation [3.34]

8.12.2 Prediction accuracy processing, equation [3.35]

8.12.3 Correction filtering equations [3.36]–[3.37]

8.12.4 Correction accuracy processing, equation [3.38]

8.13 Practice

8.13.1 Comparison of the estimation methods on synthetic data

8.13.2 Market risk hedging given a single-factor model

8.13.3 Hedge fund style analysis using a multi-factor model

8.14 Geometrical derivation of KF updating equations

8.14.1 Geometrical interpretation of MSE criterion and the MMSE solution

8.14.2 Derivation of the prediction filtering update

8.14.3 Derivation of the prediction accuracy update

8.14.4 Derivation of the correction filtering update

8.14.5 Derivation of the correction accuracy update

9 A Regularized Kalman Filter (rgKF) for Spiky Data

9.1 Introduction

9.2 Preamble: statistical evidence on the KF recursive equations

9.3 Robust KF

9.3.1 RKF description

9.4 rgKF: the rgKF(NG,lq)

9.4.1 rgKF description

9.4.2 rgKF performance

9.5 Application to detect irregularities in hedge fund returns

9.6 Conclusion

9.7 Chapter highlights Extended space-state model

10 Interview question:

how to construct a factor like ROE.
how to measure the risk or volatility of the factor.
what's the process of optimization.
how to optimize portfolio with non-linear constraint.
Barra risk attributes.

Footnotes:

Multi-factor Models and Signal Processing Techniques - Application to Quantitative Finance,Serges Darolles

Table of Contents