Table of Contents

#+todo

1 TODO

  • [X] 算法与数据结构
  • [ ] 机器学习
    • [ ] DeepLearning
  • [X] 图形数据库, Tableau, Neo4j

2 Introductory Examples

2.1 Implied Volatilities

2.2 Monte Carlo Simulation

2.3 Technical Analysis

3 Data Visualization

3.1 Two-Dementional Plotting

3.2 Financial plots

3.3 3D plotting

4 Financial Time Series

4.1 pandas

4.2 Financial Data

  • data preparation & parsing
from scipy.stats import mstats
df_factor = pd.Series(index=idx, data=mstats.winsorize(df_factor, limits=0.025))
# df_factor.apply(mstats.winorize())
start_date = df_factor.index[0]
# remove outlier data by quantile
#df_factor[df_factor>np.percentile(df_factor, 90)] = np.percentile(df_factor, 90)

# remove by z score
#df_factor = pd.Series(index=idx, df_factor[(np.abs(stats.zscore(df_factor)) < 3).all()])

4.3 Regression Analysis

4.4 High-Frequency Data

5 Performance Python

6 Mathematical Tools

6.1 Approximation

6.1.1 Regression

The residual, \({\displaystyle e_{i}=y_{i}-{\widehat {y}}_{i}}\), is the difference between the value of the dependent variable predicted by the model, \({\displaystyle {\widehat {y}}_{i}}\), and the true value of the dependent variable, \(y_{i}\). The standard errors of the parameter estimates are given by \[{\displaystyle {\hat {\sigma }}_{\beta _{1}}={\hat {\sigma }}_{\varepsilon }{\sqrt {\frac {1}{\sum (x_{i}-{\bar {x}})^{2}}}}}\] \[{\displaystyle {\hat {\sigma }}_{\beta _{0}}={\hat {\sigma }}_{\varepsilon }{\sqrt {{\frac {1}{n}}+{\frac {{\bar {x}}^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}}}={\hat {\sigma }}_{\beta _{1}}{\sqrt {\frac {\sum x_{i}^{2}}{n}}}}\]

  • The standard error is a measure of sample variability.
  • population standard deviation. \(\sqrt{N-1}\)
  1. linear regression
    • linear regression deals with estimating a variable from another variable.

    \[ Y = Xk + \epsilon\]

    1. The dependent variable (Dep. Variable) states the name of the variable that is fitted.  
    2. Model states what model we used in the fit; except OLS, there are several other models such as   weighted least squares (WLS).
    3. The number of observations (No. Observations) are listed and the degrees of freedom of the residuals (Df Residuals), that is, the number of observations (59) minus the parameters determined through the fitting 2 (k and   k).  
    4. Df Model shows how many parameters were determined (except the constant, that is, intercept).
    5. The table to the right of the top table shows you information on how well the model fits the data.
    6. R-squared was covered before; here, the adjusted R-square value (Adj. R-squared) is also listed and this is the R-square value corrected for the number of data points and degrees of freedom.
    7. The   F-statistic number gives you an estimate of how significant the fit is. Practically, it is the mean squared error of the model divided by the mean squared error of the residuals.
    8. The next value, Prob (F-statistic), gives you the probability to get the F-statistic value if the null hypothesis is true, that is, the variables are not related.
    9. After this, three sets of   log-likelihood function values follow: the value of the log-likelihood value of the fit, the   Akaike Information Criterion (AIC), and   Bayes Information Criterion (BIC). The AIC and BIC are various ways of adjusting the log-likelihood function for the number of observations and model type.
  2. multiple regression
    • a variable is estimated from two or more others.
  3. logistic regression

    Logistic regression fits models to one or more discrete variables, which are sometimes binary(that is, can only take the values 0 or 1).

    • One of the main differences between binary logistic and linear regression is that in binary logistic regression, we are fitting the probability of an outcome given a measured (discrete or continuous) variable,
    • while linear regression models deal with characterizing the dependency of two or more continuous variables on each other.
    • Logistic regression gives the probability of an occurrence given some observed variable( s). Probability is sometimes expressed as P( Y | X) and read as   Probability that the value is Y given the variable X.

    \[ln(p/(1-p)) = m + kx\]

6.1.2 Interpolation

6.1.3 Clustering

  1. K-means nearest neighbor algorithm for cluster finding.

    The K-means algorithm is also referred to as vector quantization. What the algorithm does is finds the cluster (centroid) positions that minimize the distances to all points in the cluster.

  2. hierarchical clustering.

    Hierarchical clustering is connectivity-based clustering. It assumes that the clusters are connected, or in another word, linked.

    1. Agglomerative clustering starts out with each point in its own cluster and then merges the two clusters with the lowest dissimilarity, that is, the bottom-up approach
    2. Divisive clustering is, as the name suggests, a top-down approach where we start out with one single cluster that is divided into smaller and smaller clusters

      Magnus Vilhelm Persson. Mastering Python Data Analysis (Kindle Locations 2335-2338). Packt Publishing. Kindle Edition.

6.2 Convex Optimization

6.2.1 Global Optimization

6.2.2 Local Optimization

6.2.3 Constrained Optimization

6.3 Integration

6.3.1 Numerical Integration

6.3.2 Integration by simulation

6.4 Symbolic Computation

7 Stochastics

7.1 Random Numbers

7.2 Simulation

7.2.1 Random Variables

7.2.2 Stochastic Processes

7.2.3 Variance Reduction

7.2.4 Valuation

  1. European options
  2. American options

7.2.5 Risk Measure

  1. VaR
  2. Credit Value Adjustments

8 Statistics

8.1 Normality Tests

8.1.1 Benchmark Case

8.1.2 Real-World data

8.2 Portfolio Optimization

  1. steps:

    input: weights, percentage return, percentage volatility, constraints, boundaries. percentage return = np.sum(rets.mean() * weights) * 252 rets = np.log(data / data.shift(1)) constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})

    calculate:

    def statistics(weights):
        ''' Return portfolio statistics.
    
        Parameters
        ==========
        weights : array-like
            weights for different securities in portfolio
    
        Returns
        =======
        pret : float
            expected portfolio return
        pvol : float
            expected portfolio volatility
        pret / pvol : float
            Sharpe ratio for rf=0
        '''
        weights = np.array(weights)
        pret = np.sum(rets.mean() * weights) * 252
        pvol = np.sqrt(np.dot(weights.T, np.dot(rets.cov() * 252, weights)))
        return np.array([pret, pvol, pret / pvol])
    def min_func_sharpe(weights):
        return -statistics(weights)[2]
    opts = sco.minimize(min_func_sharpe, noa * [1. / noa,], method='SLSQP',
                           bounds=bnds, constraints=cons)
    

    output:

    opts
    Out[22]:
         fun: -0.89964063622932411
         jac: array([  3.65152955e-05,   2.00167218e+00,  -1.04084611e-04,
             3.82214785e-05,   7.63027400e-01,   0.00000000e+00])
     message: 'Optimization terminated successfully.'
        nfev: 63
         nit: 9
        njev: 9
      status: 0
     success: True
           x: array([  3.16847434e-01,   8.62049147e-16,   2.64774759e-01,
             4.18377806e-01,   0.00000000e+00])
    
  2. algorithms for minimizing with constraints:

    scipy.optimize.minimize scipy.optimize.minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)

  3. 8.2.1 Efficient frontier

    8.2.2 Capital Market Line

    1. 3 scenarios:
      • solve minimum risk for maximum return above target .

      Here we find the portfolio that minimizes the return variance (which is associated with the risk of the portfolio) subject to achieving a minimum acceptable mean return rmin, and satisfying the portfolio budget and no-shorting constraints.

      • solve maximum return for risk under target.
      • solve for minimum risk.
      • solve for maximum return.

8.3 Principal Component Analysis

PCA

8.3.1 The DAX index and its 30 stocks

8.3.2 Applying PCA

8.3.3 Constructing a PCA Index

8.4 Bayesian Regression

Bayes A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

8.4.1 Bayes's formula

8.4.2 PyMC3

9 Valuation Framework

9.1 Fundamental Theorem of Asset pricing

9.2 Risk-Neutral discounting

9.2.1 modeling and handling dates

9.2.2 constant short rate

9.2.3 Market environment

10 Simulation of Financial Models

10.1 Random Number Generation

10.2 Generic Simulation Class

10.3 Geometric Brownian Motion

10.4 Jump Diffusion

10.5 Square-Root Diffusion

11 Derivatives Valuation

11.1 Generic Valuation Class

11.2 European Exercise

11.3 American Excercise

11.3.1 Least-Square Monte Carlo

12 Portfolio Valuation

12.1 Derivatives positions

12.2 Derivatives portfolio

13 Volatility Options

13.1 The VSTOXX Data

13.1.1 VSTOXX Index Data

13.1.2 VSTOXX Futures Data

13.1.3 VSTOXX Options Data

13.2 Model Calibration

13.3 American Options on the VSTOXX

14 非结构化数据可视化

15 最优化算法(锥优化、随机优化优先)

15.1 Gradient descent

In optimization, gradient method is an algorithm to solve problems of the form \[min \f(x)\].

15.1.1 Gradient descent

Gradient descent is a first-order iterative optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point. If instead one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent.

Limitations: For some of the above examples, gradient descent is relatively slow close to the minimum: technically, its asymptotic rate of convergence is inferior to many other methods. For poorly conditioned convex problems, gradient descent increasingly 'zigzags' as the gradients point nearly orthogonally to the shortest direction to a minimum point. For more details, see the comments below.

For non-differentiable functions, gradient methods are ill-defined.

15.1.2 Conjugate gradient method

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positive-definite. \[Ax=b, u_tAv=0\]

15.2 Stochastic optimization

Stochastic optimization (SO) methods are optimization methods that generate and use random variables. For stochastic problems, the random variables appear in the formulation of the optimization problem itself, which involve random objective functions or random constraints. Stochastic optimization methods also include methods with random iterates. Some stochastic optimization methods use random iterates to solve stochastic problems, combining both meanings of stochastic optimization. Stochastic optimization methods generalize deterministic methods for deterministic problems.

15.2.1 Random search

Random search (RS) is a family of numerical optimization methods that do not require the gradient of the problem to be optimized, and RS can hence be used on functions that are not continuous or differentiable. Such optimization methods are also known as direct-search, derivative-free, or black-box methods.

The name "random search" is attributed to Rastrigin who made an early presentation of RS along with basic mathematical analysis. RS works by iteratively moving to better positions in the search-space, which are sampled from a hypersphere surrounding the current position.

The basic RS algorithm can then be described as:

Initialize x with a random position in the search-space. Until a termination criterion is met (e.g. number of iterations performed, or adequate fitness reached), repeat the following: Sample a new position y from the hypersphere of a given radius surrounding the current position x (see e.g. Marsaglia's technique for sampling a hypersphere.) If f(y) < f(x) then move to the new position by setting x = y

15.2.2 Bayesian optimization

Bayesian optimization is a sequential design strategy for global optimization of black-box functions that doesn't require derivatives.

16 Graphical Models, e.g.,

  • Conditional Random Fields
  • Bayesian Networks

17 Genetic Algorithm

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection. The evolution usually starts from a population of randomly generated individuals, and is an iterative process, with the population in each iteration called a generation. In each generation, the fitness of every individual in the population is evaluated; the fitness is usually the value of the objective function in the optimization problem being solved. The more fit individuals are stochastically selected from the current population, and each individual's genome is modified (recombined and possibly randomly mutated) to form a new generation. The new generation of candidate solutions is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population.

18 First order and Propositional Rule Based Systems, e.g.,

  • Tractable Markov Logic
  • Prolog
  • Lifted Inverse Deduction Algorithms

19 Recurrent Nets, e.g.,

  • LSTM

20 Natural language processing, e.g.

  • Auto text generation
  • Auto Text Summary

21 Reinforcement Learning

22 Decision Trees (ensambles)

22.1 数据处理:离散化

离散化指把连续型数据切分为若干“段”,也称bin,是数据分析中常用的手段。切分的原则有等距,等频,优化,或根据数据特点而定。在营销数据挖掘中,离散化得到普遍采用。究其原因,有这样几点:

  • 算法需要。例如决策树,NaiveBayes等算法本身不能直接使用连续型变量,连续型数据只有经离散处理后才能进入算法引擎。
  • 离散化可以有效地克服数据中隐藏的缺陷:使模型结果更加稳定。例如,数据中的极端值是影响模型效果的一个重要因素。极端值导致模型参数过高或过低,或导致模型被虚假现象“迷惑”,把原来不存在的关系作为重要模式来学习。而离散化,尤其是等距离散,可以有效地减弱极端值和异常值的影响.
  • 有利于对非线性关系进行诊断和描述:对连续型数据进行离散处理后,自变量和目标变量之间的关系变得清晰化。如果两者之间是非线性关系,可以重新定义离散后变量每段的取值,如采取0,1的形式, 由一个变量派生为多个哑变量,分别确定每段和目标变量间的联系。这样做,虽然减少了模型的自由度,但可以大大提高模型的灵活度。
  • 等距:将连续型变量的取值范围均匀划成n等份,每份的间距相等。例如,客户订阅刊物的时间是一个连续型变量,可以从几天到几年。采取等距切分可以把1年以下的客户划分成一组,1-2年的客户为一组,2-3年为一组..,以此类分,组距都是一年。
  • 等频:把观察点均匀分为n等份,每份内包含的观察点数相同。还取上面的例子,设该杂志订户共有5万人,等频分段需要先把订户按订阅时间按顺序排列,排列好后可以按5000人一组,把全部订户均匀分为十段。
  • 离散化处理不免要损失一部分信息。很显然,对连续型数据进行分段后,同一个段内的观察点之间的差异便消失了。

22.2 随机森林

随机森林指的是利用多棵树对样本进行训练并预测的一种分类器。决策树是一种基本的分类器,一般是将特征分为两类(决策树也可以用来回归,不过本文中暂且不表)。构建好的决策树呈树形结构,可以认为是if-then规则的集合,主要优点是模型具有可读性,分类速度快。

22.2.1 随机森林的构建过程

  1. 数据的随机选取:
    • 从原始的数据集中采取有放回的抽样,构造子数据集,子数据集的数据量是和原始数据集相同的。不同子数据集的元素可以重复,同一个子数据集中的元素也可以重复。
    • 第二,利用子数据集来构建子决策树,将这个数据放到每个子决策树中,每个子决策树输出一个结果。
    • 最后,如果有了新的数据需要通过随机森林得到分类结果,就可以通过对子决策树的判断结果的投票,得到随机森林的输出结果了。如下图,假设随机森林中有3棵子决策树,2棵子树的分类结果是A类,1棵子树的分类结果是B类,那么随机森林的分类结果就是A类。
  2. 待选特征的随机选取

    与数据集的随机选取类似,随机森林中的子树的每一个分裂过程并未用到所有的待选特征,而是从所有的待选特征中随机选取一定的特征,之后再在随机选取的特征中选取最优的特征。这样能够使得随机森林中的决策树都能够彼此不同,提升系统的多样性,从而提升分类性能。

  3. Random Forest的具体使用-sklearn

    以上介绍了随机森林的工作原理,那么在python环境下,我们可以利用python环境下的sklearn包来帮助我们完成任务。举个小例子:   特征是通过收盘价数据计算的SMA,WMA,MOM指标,训练样本的特征是从2007-1-4到2016-6-2中截止前一天的SMA,WMA,MOM指标,训练样本的标类别是2007-1-4日到2016-6-2中每一天的涨跌情况,涨了就是True,跌了就是False,测试样本是2016-6-3日的三个指标以及涨跌情况。我们可以判定之后判断结果是正确还是错误,如果通过Random Forest判断的结果和当天的涨跌情况相符,则输出True,如果判断结果和当天的涨跌情况不符,则输出False。

    import talib
    from jqdata import *
    test_stock = '399300.XSHE'
    start_date = datetime.date(2007, 1, 4)
    end_date = datetime.date(2016, 6, 3)
    trading_days = get_all_trade_days()
    start_date_index = trading_days.index(start_date)
    end_date_index = trading_days.index(end_date)
    x_all = []
    y_all = []
    
    for index in range(start_date_index, end_date_index):    # 得到计算指标的所有数据    start_day = trading_days[index - 30]    end_day = trading_days[index]    stock_data = get_price(test_stock, start_date=start_day, end_date=end_day, frequency='daily', fields=['close'])    close_prices = stock_data['close'].values        #通过数据计算指标    # -2是保证获取的数据是昨天的,-1就是通过今天的数据计算出来的指标    sma_data = talib.SMA(close_prices)[-2]     wma_data = talib.WMA(close_prices)[-2]    mom_data = talib.MOM(close_prices)[-2]        features = []    features.append(sma_data)    features.append(wma_data)    features.append(mom_data)        label = False    if close_prices[-1] > close_prices[-2]:        label = True    x_all.append(features)    y_all.append(label) # 准备算法需要用到的数据
     x_train = x_all[: -1]
     y_train = y_all[: -1]
     x_test = x_all[-1]
     y_test = y_all[-1]
     print('data done')
    
     输出:
    data done
    
    from sklearn.ensemble import RandomForestClassifier
    #开始利用机器学习算法计算,括号里面的n_estimators就是森林中包含的树的数目啦
    clf = RandomForestClassifier(n_estimators=10)
    #训练的代码clf.fit(x_train, y_train)
    #
    得到测试结果的代码prediction = clf.predict(x_test)
    # 看看预测对了没print(prediction == y_test)
    print('all done')
    输出:
    [ True]all done
    

23 Instance Based Learning

  • SVM
  • k-nearest neighbor
  • Amazon Netflix Recommendation system

24 Times Series Analysis, e.g.,

  • Co-integration
  • VAR

25 Ux design and Psychology

26 Track

| programming | level |
|-------------+-------|
| Lisp        |     1 |
| VBA         |     3 |
| C/C++       |     6 |
| SQL         |     5 |
| Matlab      |     5 |
| R           |     4 |
| Python      |     7 |

| Machine Learning | Models                                           | level |
| Neural Networks  | Convolutional neural network                     |     0 |
|                  | long short-term memory                           |     0 |
|                  | Autoencoder                                      |     0 |
|                  | Bayesian networks                                |     1 |
|                  | PCA                                              |     5 |
|                  | K-Means                                          |     1 |
|                  | SVM                                              |     1 |
| Optimization     | Linear OLS(mean variance)                        |     4 |
|                  | Genetic Algorithm                                |     0 |
|                  | h(params,x)函数:hypothesis                      |     0 |
|                  | J(params,x,y)函数:cost function                 |     0 |
|                  | grad(params,x,y)函数:Gradient Descent           |     1 |
| Time Series      | autoregressive(AR)                               |     1 |
|                  | moving average (MA)                              |     2 |
|                  | autoregressive moving average (ARMA)             |     1 |
|                  | autoregressive integrated moving average (ARIMA) |     1 |