Table of Contents

1 2017

1.1 September 2017

1.1.1 线性与非线性思维 <2017-09-23 Sat>

世界是非线性的,人的大脑为了适应环境,会常选择用线性的思维来判断周围的环境,从而做出比较机械式的决定–《思维的快与慢》。例如人们以为成功是有秘诀的,所有的成功是有一套模式的,从而从成功人士身上提取一堆特质,并认为只要具有这些特质,自然而然就会成功了。这也是为什么畅销书一直火爆的秘密。但是人们忘记了,成功并没有范式,同样具有这样物征的人也许有很多很多,有这样特征的人失败的人并没有常被人们提起,而因为幸存者偏差被遗忘了。同样的道理,线性的模型认为经过了一些漏斗选择出来的股票,将来有大概率的机会会上涨或者反弹,但是这些alpha在现实世界上往往经不起考验,因为还有beta的因素在影响股票的价格。 打破线性思维,用非线性模型来思考,就像Taleb在《黑天鹅》里面说的一样,当人们见到的大多数天鹅是白色的,就认为所有的天鹅是白色的,当黑天鹅出现的时候,人们才想起天鹅还有黑色的。逆向思维,当提出一个假设以后,通过反面的例子去否定他。

1.1.2 Anormalies <2017-09-27 Wed>

科学的产生是经验的总结,人的习惯性思维是去寻找一些规律,好利用这些规律来辅助生产。

但是在投资领域里面并没有万金油的规律。这个跟黑天鹅现象很像,人们直到看到黑天鹅出现时,才知道原来天鹅不全都是白色的。一个规则运用的场景,次数很多,不代表这就是应用背后的规律。

投资时避免选择拥护的交易,逆向思维,用排除法去除大众的想法,而应该选择去寻找Anormaly,风险小,收益高。

1.2 October 2017

1.2.1 Zero-sum game theory<2017-10-19 Thu>

When applied specifically to economics there are multiple factors to consider when understanding a zero-sum game. Zero-sum game assumes a version of perfect competition and perfect information; that is, both opponents in the model have all the relevant information to make an informed decision. To take a step back, most transactions or trades are inherently non zero-sum games because when two parties agree to trade they do so with the understanding that the goods or services they are receiving are more valuable than the goods or services they are trading for it, after transaction costs. This is called positive-sum, and most transactions fall under this category.

Accumenting information is a key element at trading so that a lot of economic resources are put on the research. Most of the time the institutional investors have more and depth information than the individual investors, this makes the game not a fair one. And the disposition effect makes the unbalance evern worse, frequent buying high and selling low bothers investors, which brings loss to them.

The increasement of the corporate value will bring more value to investors as long as the stock prices go up over long term as the aggregate of economic expansion.

However, this is different from the poker game, where winners' gain is losers' loss.

1.2.2 Survival in the financial technology industry<2017-10-23 Mon>

There are two competition factors that affect people's career expectation at the financial technology industry, one is industry advancing by the era, beginners/newbies bemacs 25.2.3ring more resource including new knowledge and more capital, you have to learn more knlowlege to compete with other peers. The other one is the threat of Artifical Intellegience, most of the nowadays jobs will be eliminated by AI because of their high efficiency in the next few decades. Imaging the competition from AI that will be more productive than average people by many times, it's not hard to get the conclusion that you need study much much harder to face the competition of AI in the future.

The benefit of not being eliminated, or still having a job in the future is huge. It means you can still produce productivity and effort from a job, which is not token by robots or other competitors. This high payoff will bring those survivors enough resources to get people the most wanted dream since thousand years ago, longervity. In order to prove that you can still produce good for the society, you need a talent job making value. Otherwise, your future social function can not be approved by others, and the natural aging will be an end. Longervity can brought by the high curing level, which can cancel the aging or even better, modifying your body organs.

1.2.3 Why not follow a standard course schedule on studying after your graduation?<2017-10-23 Mon>

Standard course is designed for average people, or even worse the lowest level to save time and money in the society. Adapting learning progress and not following those old-fashion style studying skills, learninig the basic principles then choosing one topic to go deeper related to your work or project is a more talent way for people above avergae.

1.2.4 Follow the advice from the master<2017-10-24 Tue>

  1. Principle:
    • Be perceptive, ask the reason why behind the facts.
    • Be more initiative, approach to people and present your ideas.
    • Be more efficient and work more focused than others.
  2. Bad habit:

    Laziness, taking shortcut, in-sample test on GS, not challenging other people.

1.3 November 2017

1.3.1 Learning techniques<2017-11-05 Sun>

The most efficient way of learning new things is to go deep understanding of the knowledge, therefore reading a good textbook is a great way for understanding the knowledge, other resources include MOOCs, course notes, research papers.

  1. Motivation[1]

    Today's educational system up through college micromanages the learning process, thereby doing little to prepare you for independent study. How do you go about learning something on your own?

    1. The obstacles of motivation from the traditional education system
      1. isolated environment(they don't have any idea why they'd need to know them).
      2. no personalized curriculum.
      3. course bulletins provide almost no guidance about what a class covers or why it's useful(you only care a skill related with your interest, project or job after leaving from school.).
      4. impossible to go back to repair gaps in students' knowledge.
      5. school is the result of a series of compromises which allow an adequate amount of learning without overburdening society's resources. Unfortunately, these compromises mean there's often little intrinsic motivation to learn the material, hence the need for external motivators like mandatory homeworks and exams.
  2. Metacognition

    When you read a textbook or listen to a lecture, it's easy to trick yourself into thinking you understand it. You can't directly observe your state of understanding. You need to find ways to test your understanding, for instance:

    • try to write out the definitions, theorems, proofs, etc. from memory
    • do some exercises
    • try to explain the ideas in your own words

    One trick I've found to make it more obvious is parallel reading: read two different (possibly unrelated) things in parallel. Read a few paragraphs of one, then a few paragraphs of the other, and so on. Each time you switch, try to recall what the previous stretch of text was about. Switching forces it out of your working memory (the kind that stores words and images for short times). Therefore, you find out whether the material actually made it into your long-term memory.

  3. Memory

    If you're trying to learn math, programming, or anything closely related, try not to do much rote memorization(死记硬背). You get familiar with the concepts by using them.

1.4 December 2017

1.4.1 Use model assumption to increase the productivity<2017-12-05 Tue>

Think about the reading as an investment. We need the cutting edge and be early. If we can't replicate the model, cut the loss.

Summarize paper.

What data they use, what do we have, what's the assumption, what's the conclusion, what's the constraint of that paper.

Without backtesting, you have no idea if you can beat the market.

If you want to at least beat average, find the benchmark. But if the benchmark is too high, forget it, or find another opportunity.

1.4.2 How to learn things fast<2017-12-11 Mon>

How to efficiently learn things from a meeting or presentation.

If the presenten gives something I know, connect it with my objective. For example, Zhongwei and me all know mean reversion strategy, how is LSTM and PCA used on this subject.

You don't have to know every details about the presentation, connect it with your own working field by utilizing them on your project.

Machine learning is for ungifted people. Feeding the model big trunk of data, let it working out a model/prediction method for you.

Learning new technology or knowledge is the same principle, connect your previous knowledge or understanding or experience, predict the unknown field. If the prediction is right, then you have the optimized understanding of that thing, if not, you correct or re-calibrate it.

Three key points should be minded in Machine learning: objective function, optimization process, prediction.

1.4.3 Objective, Obstacle, Solution<2017-12-26 Tue>

The challenge financial analysts facing is how to extract useful information from big data, which includes news articles, financial statements, tables, graphs etc. Our objective is to provide them a tool to analyze these information with an understandable workflow in order to increase the efficiency to solve selecting strategy and portfolio, making investment decision.

One of the tools that can be utilized for increasing workflow decision is to tell machines to understand human needs, which is referred as NaturalLanguageProcessing(NLP) pipline.

1.4.4 Neural Networks on stock valuation<2017-12-26 Tue>

用一个LSTM model,加一个valuation model的方法,如discounted cashflow model[2],输出再用另一个valuation model,如Dividend discount model,来做某个industry 的return prediction.

1.4.5 Research methods<2017-12-28 Thu>

  1. Feasibility study: Know paper details
    • Is this case possible? For example, applying Variational Auto Encoder to the mean and variance from the Gausian Mixed model result, is it possible? Can graph cnn be applied a graph with directions on the signal? Can neural networks physical methods be applied on multi-variable time series data?
    • Is it efficient to solve the problem?
      • Setup building productivity: is it worthy to spend time on building the wheel? Ask the question if it is a one time application or multiple time application. Using interceptor language can save a lot of time on coding. OOP is more suitable for product running efficiency.
    • Know the data well(structure, type, statistics, number, size, length) before how a model is applied to a specific problem.
    • Proof by construction

    build a prototype with sample data, then meaningful small sample data.

    • Conclusion:

    Lessions learned from such study. Explain the axis and legend when showing a figure.

  2. Comparative study: pros and cons
    • Be sensitive to the difference between the models applied in the same field, their data, context, algorithm, results, advantage and drawback, etc.
    • Which is better? Main contribution.
  3. Literature study: what is known/unknown
    • Do we have such data?
    • What is the data like, its properties?
  4. Limit test: what if
    • Think about the limits of each model. When data changes, is it still appropriate to use such method to process data, to use the model, to get the same conclusion?

2 2018 first half

2.1 January 2018

2.1.1 about learning on your own.<2018-01-04 Thu>

  1. obstacles to motivation:
    • Students set aside a decade or two of their lives to learn (or at least pretend to learn) things long before they have any idea why they'd need to know them.
    • Students have little or no control over their course of studies, since the educational system processes far too many students to be able to personalize the curriculum.
    • Even at the college level, course bulletins provide almost no guidance about what a class covers or why it's useful (lest too many trees be killed).
    • The choice of topics favors things which can be easily tested, such as rote manipulations.
    • Since the whole class must progress through the curriculum at the same rate, it's impossible to go back to repair gaps in students' knowledge. This means the only workable strategy is to rehearse each topic endlessly before moving on to the next one.

    these compromises mean there's often little intrinsic motivation to learn the material, hence the need for external motivators like mandatory homeworks and exams.

    But notice that all the items in the above list are responses to various constraints which don't apply when you are teaching yoruself.

    The main way to stay motivated is to learn topics you're interested in learning.

  2. Metacognition

    Learning without a teacher presents many challenges, but in my experience the biggest is that you don't get any feedback. (That is, unless you're using a service with interactive exercises, such as Khan Academy.) Without feedback, you need to monitor your own understanding to decide if you're done or which gaps still need to be filled. The ability to introspect on and manage your own thought processes is known as metacognition.

    You need to find ways to test your understanding, for instance:

    • try to write out the definitions, theorems, proofs, etc. from memory
    • do some exercises
    • try to explain the ideas in your own words

    You might have had the experience while reading that your mind wandered and you completely missed everything in the last few paragraphs. It's hard to notice your lack of attention. One trick I've found to make it more obvious is parallel reading: read two different (possibly unrelated) things in parallel. Read a few paragraphs of one, then a few paragraphs of the other, and so on. Each time you switch, try to recall what the previous stretch of text was about. Switching forces it out of your working memory (the kind that stores words and images for short times). Therefore, you find out whether the material actually made it into your long-term memory.

    If you're studying on your own using multiple resources, the clashes of notation might be confusing. But I often find that confusion about notation reflects an underlying confusion about ideas, and that having to confront multiple notations ultimately helps my understanding.

2.1.2 coding review<2018-01-04 Thu>

  • using open-source model sequence
    1. reading paper, understanding paper, input data structure, algorithm, limitation, drawback, efficiency, comparison with other model/algorithm.
    2. reading open-source code documentation, demo code, data preparation process.
    3. take a Q&A on decision making.
    4. don't take high confidence on wild guess, dig into deepth and don't let the doubt stand in your way.

2.1.3 Integrating NLP with GS<2018-01-26 Fri>

  1. The overview picture of GS is to help analysts searching strategy ideas, adaptively constructing data input(X,y), and incrementally backtesting strategies.
  2. NLP techniques can transfer unstructured documentations into structured graph data, extracting entities, views, concepts, ideas, events from research paper/reports, announcement, news, books, etc. Then it provides users next steps research recommendation.
  3. Information sentiment analysis. Web crawling can extract hot and the most user concerned events/results from different web sources using a keyword, thus continuously tracking the subjects.
  4. Documents classification help can extract topics and incrementally searching information from previous results, thus recommending plentiful relevant results from one context to another.

    First of all, strategy research idea comes from various sort of unstructured data like research paper/reports, news events. Where does an analyst start from when we facing a tons of structured and unstructured data?

    NLP techniques can transfer unstructured documentations into structured graph data, therefore extracting multi perspective understanding of entities, views, concepts, ideas, events from research paper/reports, announcement, news, books, etc. Then it provides users more adaptive next steps research recommendation, helping them to finish the strategy construction and backtesting based on different context. For example, what data they use, the brief ideas behind the paper, what models/algorithms they can use, how the results look like(positive or negative).

    A more detailed example is the Black-litterman model research. The Black-Litterman model uses a Bayesian approach to combine the subjective views of an investor regarding the expected returns of one or more assets with the market equilibrium vector of expected returns (the prior distribution) to form a new, mixed estimate of expected returns.

    The model starts with the equilibrium assumption that the asset allocation of a representative agent should be proportional to the market values of the available assets, and then modifies that to take into account the 'views' (i.e., the specific opinions about asset returns) of the investor in question to arrive at a bespoke asset allocation. Such views can be extracted using NLP tools like Named Entity Recognition from various documents under the most recent sector condition.

    Another useful NLP tool is information sentiment analysis. Web crawling can extract hot and the most user concerned events/results from different web sources using a keyword. Such a keyword can also be highly adaptive to user and associated with its relevance. NLP can process web crawling information, extracting entities, relationship, plugging such unique financial knowledge graph into the overall database, thus continuously tracking the subjects on time and depth dimension.

    Based on hot events or keywords, documents classification help can extract topics and incrementally searching information from previous results, thus recommending plentiful relevant results from one context to another. This technology gives more possibilities to understand the most cared user question and attempts.

2.2 February 2018

2.2.1 产业链分析<2018-02-01 Thu>

  • 通过联想到挖掘机销售提高,industrial sector, 上有挖掘机生产,三一重工,矿业里有紫金矿业,需求上有电子行业。
  • 山东疫苗案—实杰生物与沃森生物。2016年2月份疫苗案开始被媒体逐渐报道;3月7日,实杰生物总经理辞职;同月18日,山东药监局开始报道破案线索;21日晚上药监局公布实杰生物涉及此案;22日,实杰生物与其母公司沃森生物(创业板)同时停牌。通过实杰生物的股转说明书,我们可以知道它是从事疫苗代理销售,冷链管理的;它的主要疫苗包括此次案件涉及的产品;它的第二大客户便是山东省疾病预防控制中心。之后,从事同类业务的中小板上市公司鹭燕医药也涉及此案,24日刚开盘便跌了5.80%。通过这一系列信息,即使不能建立直接联系,但至少也可以警惕相应股票发生重大风险。
  • 金蛋理财借壳南京软智挂牌新三板。2015年6月3日,南京软智同意向邓巍等两名外者以1块钱的价格发行2000万股,同时辞去董事职位,并提名这三位为新进董事。1块钱的定增往往是对内部员工的激励措施;邓巍所拥有的公司钱得乐和南京软智的经营范围类似;南京软智的净资产很低,在13年年底只有300万。有这三个基本数据后,股东决议定增的公告一出来我们就应当能判断出借壳上市,而真正的收购公告在6月18日才公布出来。
  • 《叶问3》票房造假—十方控股与神开股份。《叶问3》的投资方是快鹿,控制着神开股份和十方控股这两家上市公司,与快鹿关联的是P2P公司金鹿财行,充当资金端,快鹿在影视业的关联公司是合禾影视,是资产端。2016年2月,十方控股和神开股份相继认购《叶问3》的票房收益权。票房作假可拉升二级市场股价来获得利益,并弥补P2P的融资成本。但众所周知票房造假是国内普遍现象,为何又在此时此刻被揪出来?3月25日成立的中国互联网金融协会的主要职责就是监管P2P市场,很难讲这里面没有联系。

2.2.2 Reading habit<2018-02-02 Fri>

First of all, I really need to realize that my understanding has serious problem.

How to fix it:

  1. repeat everything and rephrase it.
  2. slow down, don't skip anything, do deep understanding of that problem.
  3. ask myself do I really understand it, do I miss any information? This can be done by repeat reading.
  4. speed reading is a bad habit, since we have NLP we don't have to do speed reading in order to scan information and find patterns in the future, the machine can do it.

Speed reading is a nice idea, and the ability to see 1000 words a minute is possible. However, you don't truly understand those words. Research is pretty limited here, but Keith Rayner's "Eye movements and information processing during reading" is one of the more comprehensive looks at how our eyes work when we're reading. Rayner believes speed reading claims are nonsense because our eyes can't work that way:

When it comes to eliminating subvocalization with techniques like meta guiding, Rayner points out you quickly lose comprehension:

You can practice going faster and you probably will, but when you start going too fast you'll start losing comprehension. Most speed reading methods involve getting rid of subvocalization. Research shows that when you do that and the text is difficult, comprehension goes to pieces.

when I'm reading a book or article I can take a few moments to pause and think about an idea. With speed reading, these moments are gone. I might consume a ton of information, but I don't feel like I actually process it. So, in short: Speed reading anything you need to truly comprehend is probably a bad idea. However, if you have a few documents you need to get through or you're reading something that isn't that important, these methods can still be worthwhile. Just know that you won't become a super-fast reading comprehension machine.

https://lifehacker.com/the-truth-about-speed-reading-1542508398

https://www.wired.com/2017/01/make-resolution-read-speed-reading-wont-help/

2.2.3 Financial knowledge graph<2018-02-05 Mon>

具体到金融行业,建立知识图谱通常要经历三个主要步骤:

- 从海量的结构化、非结构化数据中识别金融实体;

- 根据业务需要,定义并识别金融实体间的各种关系,进而生成知识图谱;

- 定义并表达业务逻辑,通过在知识图谱上实现各种具体任务来体现数据价值,如推理等,实现数据到智能的升华。

  先来看第一个步骤:实体的识别是从文本中抽取出特定的实体信息,如时间、人物、地点、公司、产品等等,由此确定了知识图谱中的点。

  再来看第二个步骤:关系的识别则是指实体间的各种关系,如地理位置关系、雇佣关系、股权关系等等,这些关系确定了点与点之间的边。需要说明的是,常用的抽取关系的方法有基于专家知识库和基于机器学习等类型。其中,基于专家知识库的方法是由行业专家构筑大规模的领域知识库,需要专家参与,一般耗时费力,但是质量相对比较可靠;机器学习的方法需要构造特征向量形式的训练数据,使用机器学习算法自动构造。需要特别指出的是,对于非结构化文本,实体识别和关系抽取需要基于自然语言处理算法,以及深度学习算法(例如,用词向量的方式寻找近义词,提高实体模糊识别的准确度),这是一个反复迭代、不断精进的过程。

  最后来看第三个步骤:推理能力是人类智能的重要特征,是由一个或几个已知的前提推出结论的过程,也可以从已有的知识中发现隐含的知识。在推理的过程中,往往需要一些规则的支持,例如:从“某人甲”既是“企业A”的法人也是“企业B”的法人,可以推测出“企业A”和“企业B”之间的关联关系。当然,这里会涉及到概率的问题。当信息量特别多的时候,如何把这些信息有效地与推理算法结合在一起是最关键、最有挑战性的工作。常用的推理算法包括基于逻辑和基于分布式表示的方法。随着深度学习在人工智能领域取得的突破,基于分布式表示的方法已成为目前研究的热点。

策略的生成。现有阶段,机器在业务场景中还无法完全替代人类的作用,而是辅助人类作出价值判断、风险判断,通过过往的案例或者既定的逻辑,为人类推荐可行的策略。在此,涉及到人工智能的方方面面:对用户交互而言,有意图理解、语言生成、用户画像匹配等;在业务层面,有逻辑生成、投资模型、风险模型等;涉及的数据处理有规则提取、知识库建设,语义检索、逻辑推理等。

其中需要的主要技术为:文档篇章分割、中文分词、实体提取及消歧、关系提取、规则库建设等。金融领域中的征信、融资、资管、二级市场交易等都有具体的业务场景,都需要业务逻辑,这些逻辑在数据之上表现为模型,需要在基础数据和领域知识的基础上实现。

  此外,金融知识图谱还包含了很多其他的形式,例如:A股的公司、港股和美股的公司,各种基本面的数据、行情的数据都在逐渐的知识图谱化,还有公告数据、研报数据、以及工商数据等都是金融知识图谱的分支。

2.2.4 Management case study<2018-02-05 Mon>

在公司工作的半年多时间内,发现了很多因为工作方法不正确,而导致在某些工作项目上效率低下的问题。现把一些现象背后的原因进行分析,得出经验,总结出一套规则,以及将来怎么样贯彻这些规则做一些说明。

  • 从小的数据开始着手,通过小实验以后再增量扩大训练样本。
  • 读资料时应该带着目标和问题。
  • 尽量使用网上已有的库,少自己造轮子。
  • 别老是担心面子上是否过得去,要担心是否能达成目标。
  • 高效、创新的思考者会犯错并从错误中汲取教训,因为这是创新过程的自然组成部分。
  • 把自己和他人的缺点说出来或者写下来,有助于记住和认识这些缺点。
  • 努力获取正确方案。
  • 诊断并了解问题的症结。
  • 进行项目时应该有计划

拥有规则极其重要,不仅对个人是这样,对于公司团体来说亦是如此。个人处事时能选择自己喜欢的价值观和原则,同样,在团体中工作时,我们必须认同团体的价值观和原则。如果一个团体对其价值观和规则不够清晰,就会产生混乱,最终流于平庸。一个团体的价值观和原则一旦清晰明了,团体的行事方式(也就是他们的文化)将渗透于他们所做的每一件事情,包括如何制定目标、发现问题、诊断问题、制定计划、实施计划。

  • 别老是担心面子上是否过得去,要担心是否能达成目标。

抛开你的不安全感,去努力实现你的目标。当你发现你犯了错误或者不知道一些事情的时,体会一下自己的感受,这可以检验你是否过于担心看起来是否很好。如果你发现你感觉不好受,反思一下——提醒一下自己,最有价值的评论是精准的批评意见。

Case: Martin与我商量wiki导入Neo4j的项目事项,具体在时间上的估计时,他预估了一周时间,最后给他一天时间去做校验。因为面子原因,我以为我可以在一周时间内完成任务,但是一开始的工作并没有我想像中的顺利,提取有效实体,实体之间的关系,特别是怎么样把提取出来的数据导入Neo4j,都碰到了困难,以致于在星期五结束工作时,只导入了节点,而边还没有开始导入。这时我本应该告诉Martin我碰到了困难,时间上来不及,但是因为面子的原因没有及时公布项目的进展,而影响了他的工作安排。

这时如果我积极公布项目进展,主动寻求帮助,可以让问题更快的暴露出来,使工作更快的完成。

  • 高效、创新的思考者会犯错并从错误中汲取教训,因为这是创新过程的自然组成部分。

你从一个错误中汲取教训,会让你在未来避免犯成千上万个类似错误。因此,如果你把错误当做是产生快速改善的学习机会,那么你应该为错误而兴奋。但是,如果你把错误当做是坏事情,那么你和其他人都会觉得很悲哀,你就无法成长。你的工作环境将会充斥背后说坏话和恶语中伤等现象,缺乏对真相的健康、诚实的探寻;对真相的健康、诚实的探寻会带来进化和改善。正因为如此,你犯的错误越多,你进行有质量的、诚实的诊断分析越多,你的进步就越快。

  • 把自己和他人的缺点说出来或者写下来,有助于记住和认识这些缺点。

隐藏缺点是不健康的,因为你隐藏它们,就会延缓你解决问题的进程。反之,如果你直接面对、不容忍这些缺点,那么你将不可避免地在进化过程中克服它们。

Case: 中伟经常会对我进行一些工作上的指正,指出我的一些缺点,例如读文章不够仔细,提的问题不够深入。我感谢他提出我的这些问题,因为可以帮我改正我的不良习惯,从而提高自己的水平。

  • 努力获取正确方案。

本着开放的心态,找合适的相关方一起讨论或辩论重要议题,直至确定最佳方案。这一过程最有利于学习和互相了解。努力研究,直到获得最佳方案。

Case: 怎么样确定自然语言实体导入Neo4j时的hash,刚开始的设计方案得到GID的方式与GS能够从网页上通过url获取得到的GID不一致,通过与赖根,段奡和Martin商讨以后得到了一个最好的二选一方案。

  • 诊断并了解问题的症结。

要认识到,所有问题都只是其根本原因的表现,所以要通过诊断分析来理解问题症结所在。不要把问题解决了就不深究了。问题是机器的产物,而机器由其设计和人构成。如果设计和人都非常好,那结果肯定也不错(尽管不完美)。所以,当你遇到问题时,应该分析设计和人,确定哪一项导致了失败及其原因。

Case: 编程时的风格:函数,变量,类的命名规则经常不一致,没有统一的风格,缺乏可读性,这是因为大家受过的培训和教育不一样所导致。

提出改进的方案:制定统一的编程风格规范文本,严格参照该规范文本进行工作。

  • 项目时应该有计划

制定有效的计划需要把事情想透,并图示事情的来龙去脉。关键是要形象化描述你来自何处(或你做过什么)、你现在的情况如何以及未来如何发展直至实现目标。你应该按照时间维度来图示整个计划,就像看一部讲述你过去、现在和未来的电影一样。然后把计划写下来,保证自己能随时看到。计划要包括谁做什么,什么时候做。具体任务清单源自故事(即计划),但它们是不一样的。故事或者计划是把你的目标和任务关联起来的东西。

设计你的计划时,要考虑各种相互关联的任务的时间顺序。先草拟大概,再用具体任务补充完善。

Case: 在做Word Embedding时,做出的计划应该包括一开始找什么资料,读什么书,什么paper,用什么技术,能够找到什么数据,最后怎么测试这些数据,比较各个技术之间的优势和不足,了解每项技术的瓶颈,解决方案有哪些,运用这些技术能够解决哪些问题,有哪些跟实际工作相关的应用,自己这样做的优势在哪里。

2.2.5 Social Science Paper Structure

  1. Introduction

    State the contribution.

    • Problem

    Benefits to be gained by the research or why the problem has not been solved yet.

    • Purpose

    What you are trying to achieve.

    • Own position

    Overall view of the paper. A quick summary of the form that the parts of the research paper is going to take.

  2. Literature Review
    • Previous research
    • Contribution in more detail

    Differences from what has been done before: Is it new data? A new model? A new identication strategy? Are you answering a question more broadly/specially? Specifically comparing how you are improving on a previous paper is useful.

    • Method/Model

    Exact design and methodology used to perform the research.

    To increase the understanding process, connect the model with previous known models. It will help you quickly find the similarity and difference.

    • Data

    Describe the name and source of the data you are using and the period it covers.

    Present (relevant) descriptive statistics of the data.

    • Formulate

    A specific question problem, or conjecture, and to describe the approach you will take to answer, solve, or test it. Often, this will take the form of an empirical hypothesis: social security depresses personal savings;

  3. Results

    Numerical results and data, tables or graphs of data.

  4. Discussion

    Elaborate on your findings, and explain what you found, adding your own personal interpretations.

  5. Conclusion

    Emphasizes the importance of the results in the field, and ties it in with the previous research.

2.3 March 2018

2.4 April 2018

2.4.1 如何避免被党国挟持:<2018-04-01 Sun>

培养自己独立的人格, 一个人成不成功,是建立在他个人的成就上的,与国家无关.伟大的人之所以伟大,是因为他们的成就,而不是因为他们的背景.爱因斯坦的成功,不是因为他的美国籍.

还有根本的就是要培养自己的公民意识.公民意识代表自己为自己的行为负责.

2.4.2 看资料需要达到的目标是理解他的思路,重点不是他底层的技术,不要重新去造轮子。

看资料不懂的时候就要问,例如我的问题实际上是LSTM理解不深刻,这时我去看LSTM在tensorflow里面怎么搭的其实是很费时间的。理解他大概的原理就可以了。

要把目标分解成sub-goals.

另外在presentation的时候要把原理性的讲述做为重点,详细展开,包括数据介绍,步骤,结果。

2.5 May 2018

2.5.1 resources are not unlimited when operating tasks. you are not Google.<2018-05-02 Wed>

2.5.2 pay attention for presentation format, keep it simple and clean, remove redundant noise.

2.5.3 define reward, action space for news recommendation use case.

2.5.4 build a knowledge graph with mentioned events in searching results, then use such knowledge graph and reinforcement learning to infer user interests.

<2018-05-07 Mon>

2.5.5 think about my relative strength:

  • get the things done.
  • don't have to waste too much time on something you know that will be failed or be succeeded.

2.5.6 set the objective specific, get positive and negative feedback.

we are human, we have small data. so we have to plan, we are not good at model free based learning. however, we are good at model based learning. we set a goal(making a plan), get positive or negative feedback. gradient search to one purticular direction, do not randomly try things.

2.5.7 fail as soon as possible.

2.5.8 that's a trade off between thinking big and trying many things.

2.5.9 search, machine learning, reinforcement learning, diet, exercises.

2.5.10 increase the searching space of actions doing work.<2018-05-09 Wed>

2.5.11 plan ahead, organize task as a workflow.

2.5.12 remember paper cases.<2018-05-14 Mon>

Idea: how to understand paper. 1). Find out rules behind the paper model. 2). summarize paper rule.

How connect paper to create useful use case. remember specific example in the paper via visualization. Use rules behind the paper to create example and ideas.

Why you have to remember the idea & example in the paper? 1). Taking notes on paper then go back to read notes is costly, you don’t read 80%-90% of your notes. 2). memory will trigger you to create similar ideas to solve some problems, then you create examples.

why we need use RL multi-user & multi-task workflow to do classification & clustering. 1). original idea: Multi-user & multi-task RL is based on sketch learning.

2). RL works by making plans based on different goal settings, e.g. collect resources to make tools.

3). derived idea for trading: I. goal: stock pool. II. goal: get a trading strategy

Results from I can feed to step II. E.g., pick stocks from analyst profile, then create a trading strategy. stocks Warren Buffet picked from long term perspective are easy to forecast, then trade these stocks with a long term strategy.

follow up: Give specific policy instance about: 1). how to pick stocks. 2). how to trade. Search machine learning strategy based on x, y.

2.5.13 Combine NLP, RL<2018-05-15 Tue>

Topic: Combine NLP, RL and web scraping to extract useful financial research information.

Goal: Find out the rising technique not in its early stage nor before it’s widely adopted.

Why? Traditional searching engine is biased(bid ranking), decision making based on first searching result page is not always accurate for user, especially it’s not customized.

Proof: Reinforcement learning in the brain

A key reinforcement learning signal exists in the brain, the temporal difference reward prediction error.

Idea:

  1. 我们用sql query从RMDB里面取数据, web crawl是不是也可以等价于相当于是在互联网搜数据。
  2. 判断某个keyword/topic/technology的s surve stage需要收集time series数据,找提取这种数据的网站。

例如google scholar, 36kr, seekingalpha, github, twitter.

S curve indicator data: search google scholar with keyword + survey.(advanced technology should be fully covered) Youtube, SSRN, arXiv. number open source libraries citation number.(find out the technique required with the help of expertise from authors) Conference paper. outsourcing companies/job description(mature technology is covered by outsourcing companies/job description) patent application. product + technology + use cases + data set(define use case dimension, use specific use case).

  1. Design an agent ranking feature, to identify useful agent and distribute more jobs to them because they receive more reward among agents.
  2. GS can replace traditional expert labeling by online labeling(knowledge -> policy):

taking notes change parameter generate data. design action space. transfer to action reward.

  1. difference with GS and ricequant:

Ricequant is a platform providing data/tools. GS has a workflow to ease the research process.

2.5.14 Learning

we have two types of neurons within studying process, focused neurons and defused neurons.

our focused neurons are active when we read and think, our defused neurons work when we do other exercises except studying.

In order to activate those synapses, we need practice with different methods or exercises.

2.5.15 communication 太少<2018-05-17 Thu>

  1. observation

    long term progress and short term progress report is not too much.

  2. reason

    long term goal milestone很难一下子达成 short term progress没做出来或者失败的多,不报告

  3. experience

    goal: report negative examples(bad cases) can help GS to solve problems we met.

    要知道为什么这么concern,需要从根源上去找答案。It's not much helpful if GS is a solo software to solve the problem.

    increase precision, 在沟通的时候需要多问别人是否理解自己的话,增加feedback.

    increase recall, 增加数量,次数,从根源逻辑上解释话,自己是否也是一样的逻辑。

2.5.16 presentation instruction feedback, communication<2018-05-18 Fri>

  1. differetiate clear points and points having confusion, put on a mark sign.
  2. understand the meaning of all symbols.
  3. take out the example from the paper.
  4. classify the function of the paper, understand the input and output.
  5. quote original text from paper if possible.
  6. conduct connection/rules from paper to our usecase.
  7. spend more time on thinking, abstarcting, comparison.
  8. having hierarchical goals on reading. 看一遍肯定是不够的,第一遍和第二遍的goal也是不一样的。

communication skill connected with reinforcement learning gradient search:

  1. in order to keep away from local minimum, we need to push the gradient search from local field to global field, if the function is convex.
  2. giving feedback in the team, no matter positive or negative, is helpful to let others know your project condition, and they can help you get out of the trouble.

2.5.17 What is missing from online edu or MOOC hype since 2012 and how Gs can fix it in vertical domain <2018-05-21 Mon>

  1. Most Online edu still too much like traditional courses and less like gaming and trading.
  2. Copy the really successful gamers and self learners and get rid of traditional courses formats customized to multi task multi user workflow including concepts , usecases , applications , debugging , searching supported by nlp and rl technology.
  3. matching students directly with projects and employer , who design and pick courses and tasks at different levels , make the 2-way reward system much more flexible to fit specific task.
  4. MOOC failed to use new form of edu replacing core customer of traditional education at top schools by ML and task based skill.
  5. Traditional homework replaced by discussion , reading and writing papers, product design and demo, ppt presentation , fund raising basically the real thing real task instead of toys more like mba courses.
  6. Courses online missing team building and real ease for anyone to design a course or RL task with goals rewards etc.
  7. GS can target low tech univ in both really high end of traders and low tech end.
  8. change each paper and research report and news into a simple course snippet with homework questions designed by author , peer and self.
  9. use RL to understand why pre commitment works because it gives instrinsic reward afterwards before real external reward in reward sparsity situation.
  10. online course not designed by focusing on search and trial error.
  11. The trend is much more learning over human lifetime , less time producing but once produced , it can be massively leveraged into services and shared knowledge.
  12. adjust notification freq by RL instead of real time that interrupts or judge the cost of interruption before Alert.
  13. chaging one habit can have down side to reduce one relative strength from that habit , aware of that habit weakness use team work and tool to complement it is best.
  14. make gs work so interesting that intern want to work for free, filtering out those with right passion and forcing gs to be good enough without having to pay user to use it as a job , money can be short term initial reward to get users to try out product like b2c marketing strategy Eg coupon , Uber etc.
  15. uncertain outcome is addictive because brain backpropergation algo the algo reward triggered by error term y hat - y , this is like epsilon gradient algo.
  16. a good long term goal is those which can predictably trigger more external short term rewards.
  17. hiearchical task and multi user games can be overlapping in game setup but different in levels for different players Eg 割韭菜 一管就死 一放就乱 are basically game settings with demographic support serving as env for policy makers as agent , agent of different people inherit from previous people can be seen as one agent.
  18. turn a paper to structured document. then summerize the structure and provide quizz. 不同轮次读的理解层次不一样。
  1. Follow up
    1. Search reinforcement learning + cognitive bias.
    2. search reinforcemnrt learn + habits or anything that can improve ux and improve human RL workflow.
    3. truly learned skill by brain means it can be recalled at right time with least mental effort.
    4. why mindful meditation works ? Neuron science point of view.
    5. combine neuron science psychology economics (team game) etc enlarges human intuition to retrieve known test cases from human memory and self low cost experiment.
    6. ask role 2 in what scenario or state you start to put 2 things together (attention alignment) to induce and deduce facts and pythothesis ?.
    7. Gs: use different form of question to help human recall attention based memory aligned with current task , multiple choice question then specific episode , analogy like questions.
    8. homework is hard and procrastinated because it lacks hints and rewards and focuses too much on large step recall, detailed hints not customized for each student kB.
    9. when reading , mixing usecase and theory books to help make connection Eg China or industry analysis book with causality book or nlp book , mix software book with usecase or theory book for practical product connection . Read knowledge adjacent to each other but in different categories.
    10. let analysts give specie scenario explained by reason for agent action , rl matches action to general causal workflow , generate more scenarios then again checked by human for labeling.
    11. ask role 2 to review own workflow and come up above better action or policy instead of just write down workflow step by step.
    12. To be like Marty Yu maybe when reading financial report , think of management decision and market / regulatory env to come up with causal story verified by numbers , how game was played as action seq. With concrete episodic examples of firm to retrieve related data points.
    13. attention tech applied to current want or goal aligned with contents or inputs ie always read with question in mind (goal) dot product to get attention to decide next move or state.

2.5.18 recall knowledge from the principle or usecases(utilized with the principle)

connect the observed infomation with your own knowledge trunk.

2.5.19 need to comprehend conversation more deeply before giving a response.<2018-05-29 Tue>

Find comprehension QA dataset, which answers these questions need deep understanding.

2.6 June 2018

2.6.1 negative thinking will shut your brain down. feel safe and speak frankly.<2018-06-12 Tue>

2.6.2 review your daily job with the purpose, so you can improve your skill, increase your working efficiency.<2018-06-19 Tue>

2.6.3 productive meetings<2018-06-25 Mon>

  1. Make sure everyone is on the same page
  2. Use a whiteboard to stay on track
  3. Assign follow up
  4. Use the "two-minute rule"

2.6.4 Give alert when you waiting for somebody else to finish the rest of the job.<2018-06-29 Fri>

3 2018 second half

3.1 August 2018

3.1.1 网约车O2O:

生态:

  • 智能客服,聊天机器人
  • 人脸识别,司机,乘客表情识别
  • 智能调度, 供需预测热力图
  • 汽车金融
  • 智能定价,不同车不同城市定价不同
  • 路径规划

4 2019 first half

4.1 June 2019

4.1.1 Recommendation system

Customers will lose themselves if they face thousands of goods when they try to buy something, the utility of recommendation system is to improve the sales. In a perspective of the sales, recommendation is good for them, but as a customer, you will spend more time and money if the recommendation system is working well. If you want to save some time and money, stay away from the system with recommendation function.

4.1.2 Unsupervised learning

anomolous driving

5 2019 second half

5.1 August

  • China is marked currency manipulated country from the US.
  • Gold and Bond are surging because of the risk reversion.
  • 2 year bond yield is higher than the 10 year bond yield means people are pursuing risk free asset. The market may face a crysis or recession.