article Archives - Page 10 of 11

顶尖科技公司是怎么招到一流人才的？

谷歌、Facebook、苹果、亚马逊等这些顶尖高科技公司每年都会收到“千万亿”量级的简历，可以肯定：他们有自己筛选人才的方法，但那究竟是什么呢？答案，绝不是网上很火的那套“谷歌面试题”。事实上，他们对招聘过程的系列微调，已远远超出算法和量子物理学这些东西。如果你也想招到顶尖人才，不碍试试下面秘诀。

1，提前/推后 15 分钟打电话，或者就是完全不准时

为什么？目的是：找到时刻为工作准备着的人。如果你是在“可预期”时间里打电话，那么人人都可以准备得很充分。但如果你打电话时他们在睡觉，或上舞蹈课，或是上厕所，结果会如何？这就是顶尖高科技公司找到“随时可以投入工作”员工的秘诀。

2，面试时间表混乱且不可预测

为什么？目的是：找到不需要依赖“说明书”的人。确保面试者和应聘者都不清楚面试过程会发生什么，可以完美考验在没任何”线索“提示下，谁会执行得最优秀。

3，做 Presentation 时出点岔子

为什么？目的是：考验应聘者是否能在外界环境处于不理想时作出调整。

特意找间房子，里面用来做 Presentation 的设备有问题，如果应聘者能随意而安，并且可以从容应对，那就说明他/她在工作中易于相处。如果他/她同时还备有 Plan B、C、D，那就该加分，干科技这行，这个习惯很重要。

4，面试时，不断抛出错误假设

为什么？目的是：淘汰掉容易生气的应聘者。如果他上份工作是 Twitter，你就问：“你在雅虎干了多久？”

注意并记录他/她纠正你时的语气，很生气还是保持冷静？由此，高科技公司可以考验出应聘者在遇到麻烦时的工作表现。

5，让他/她解决你的特定问题

为什么？因为你真需要帮助。高科技公司经常让前来应聘的人试着解决公司面临的实际问题。这是获得免费帮助的好方法。

6，让应聘者频繁辗转于多个房间

为什么？目的是：找到那些即使感觉不舒服但仍然很兴奋的人。不要让求职者在面试过程中太舒服，这是找到那些能在不适状态下仍保持兴奋员工的方法，并且也展示出：没有哪间会议室可以用上一整天。

7，反复问同一个问题

为什么？目的是：测试一致性。高科技领域，可预测是好事，不要担心你一直重复问同样问题，这是测试求职者”一致性”的好方式。只有在面试高级角色时，应聘者才可以表现出和他们自己答案疯狂的不一致。

8，“好警察”/“坏警察”氛围下双重面试

为什么？目的是：找到那种可以在高压下完成多任务的人。让应聘者坐会议室中间，面试官坐在桌子两边。

应聘者是否能同时兼顾两个面试官的问题，并能同时做出充分回答？还是说已经被搞得精疲力竭，并且很困惑为什么会接受这样的面试？这是考验一个人在压力情况下会如何表现的好方法。

9，问问题，然后大声打字

为什么？目的是：找到那种在干扰环境中仍能保持注意力集中的员工。问应聘者一个问题，然后当他开始回答，立刻大声打字。同时道歉说你“在听，只是在做笔记。”

你可以做笔记，也可以是给你不经常联系的父亲发邮件，无论你在做什么，都没所谓。观察应聘者是否仍能专注你提出的问题，还是已经开始分神？这将帮助你找到在工作中不受干扰的员工。

10，3 个月后，通知他被录取了，但这不是他申请的那个职位

为什么？目的是：找到决心始终如一的人。这个方法可以淘汰掉那些一开始就对这份工作不怎么坚定的人。

应聘者努力争取他想要的职位了吗？或是他只是觉得这是他能得到的最好职位，所以就接受了？还是他实际上几月前就找到了别的工作，所以拒绝了？这是弄清这些问题的好方法。

文自硅发布，来自谷歌前员工 Sarah Cooper。

阿凡题CEO陈李江：通过共享经济重构在线教育

2016年01月19日，腾讯创业曾报道过的项目阿凡题已正式上线500多天，并在北京举办了首场媒体见面会。目前，阿凡题注册用户数突破2000万，用户平均每日使用时间为15分钟，入驻老师高达数万名。经过500多天的发展，阿凡题已成功搭建“全国最大的老师真人在线答疑平台”。

“被教育者的感受与需求很多时候都被忽略了。”阿凡题创始人兼CEO陈李江表示，“作为曾经的高材生与成长中的产品经理，当我们反思在线教育的服务标准，我们认为在线教育不应该只是把‘书山题海’搬到线上。”

全国最大真人在线答疑平台让学生不再带着问题睡觉

“根据阿凡题的调查数据显示，中国中小学生普遍睡眠不足7小时，近九成高中生为了写作业晚上11点还在熬夜。”阿凡题联合创始人兼COO王庆苑分享到，“阿凡题希望成为一款真正能帮到学生的，让他们发自内心喜欢的产品。”

2015年11月，阿凡题发布的《全国中小学生学习压力调查》显示，中国学生作业时长是全球均数的2倍，68.9%的学生认为难题是耽误写作业进程的主要原因。当面对难题，接近70%的学生认为找老师是最好的解决方式，然而实际上只有10%的问题通过老师解决了。

针对2亿中小学生无法快速找到老师解决作业难题的现状，阿凡题在2015年秋天率先推出真人老师在线答疑服务——“即时辅导”产品，“30秒找到老师，5分钟辅导一个难题”的极速服务让遇到难题的孩子能够得到快速解惑。至此，阿凡题三层答疑服务标准初现：

拍照搜题，业内首创“机器解题+人工答疑”的双重答疑模式，率先实现一分钟有问必答，成就了业内唯一100%解答率的在线答疑产品；

真人答疑，借助上万名老师真人连续答疑服务，形成师生追问追答强互动关系，阿凡题独有的师生互粉关系已发掘出粉丝破万的“明星老师”；

即时辅导，针对学生遇到的难题，老师通过类似课堂直播的方式进行一对一辅导，实现个性化无死角答疑。

现在，阿凡题拥有数万名真人老师，已成为“全国最大的老师真人在线答疑平台”，让中小学生不再带着问题睡觉。

在线教育“黑科技”：未来的智能家教机器人

2015年5月上线的阿凡题-X是全球首个既支持印刷体数理化，也支持手写体公式和运算题识别的产品，阿凡题-X不仅能够在1秒内完成加减乘除的基本运算、解一元一次方程、一元二次方程和二元一次方程组，还能给出运算过程。这是全球范围内首次取得手写体识别、人工智能的双重突破，目前，阿凡题-X的核心技术已申请国际专利。

值得关注的是，作为人工智能的“眼睛”，为了实现对手写体的精准判断，阿凡题-X在开发过程中就开创了行业先河，首次成功收集了128万个真实字符手写样本，然后通过光学字符识别、去噪、字符切割等过程，成功实现了计算器和手写体公式间的互动。得益于神经网络模型与深度学习技术的结合，阿凡题-X能够在1秒内完成运算并给出解析过程，成功突破了题库的瓶颈，让答疑获得了无限可能。

这意味着，随着用户不断的参与互动，阿凡题-X将会越来越“聪明”，未来不仅可以应对任何一道题目，甚至能够成为帮助学生答疑解惑、协助老师传道授业的“智能家教机器人”。

不仅如此，阿凡题还将与刚收购“有妖气”的中国最具实力动漫文化产业集团公司奥飞建立战略合作伙伴关系，共同发展儿童教育内容合作。

用滴滴“打”老师模式重构中小学生在线教育

中小学生学习的过程是老师在教学中教授知识，学生通过作业反馈掌握程度，从而形成循环逐步深入。在实际反馈环节中，学生在作业中遇到难题时，很多时候无法获得解答。阿凡题通过调研发现，所有家长都希望能给孩子一对一的教育服务，学生也希望获得一个懂自己、能够在自己有问题的时候及时出现的老师。

学生面对难题时的“痛”，是阿凡题深耕在线答疑的原因。老师则是解决问题的关键。这个需求可以类比出行的需求，每个人都希望能够获得舒适的出行体验，所以当滴滴出现的时候，出行的需求就被更好的满足了。面对答疑这个高频刚需，阿凡题希望通过多维度的便捷服务满足学生不同的需求：

1.学生可以独立理解的疑问和对知识点的回顾、举一反三通过拍照搜题和人工智能就能够解决；

2.学生不能独立理解的疑问由真人答疑来解决，通过提问和追问追答在与老师的互动中将难点逐个击破，并在答疑解惑中建立更强的师生关系；

3.即时辅导专门解决个性化的重大难点疑点，每个学生对知识掌握程度的不同决定了难点疑点的千人千面，这样的情况下关键知识点由懂自己的老师来讲解将获得事半功倍的效果。

正因在阿凡题中学生与老师的滴滴式呼叫应答关系，率先推出真人老师“即时辅导”服务的阿凡题获得了“滴滴‘打’老师”这个特别称号。每个学生在碰到难题时，可以像打车一样，一键呼叫老师求助。

“阿凡题的模式借鉴了共享经济理念，”陈李江分享到，“我们希望通过共享中国最好的教育资源，让每个孩子都有成为高材生的可能。”

未来，阿凡题将成为全国最大的在线学校。通过阿凡题，全国2亿中小学生将与1,300多万老师无缝连接，无论先进与落后，中国每个地区的孩子都将能找到适合自己的老师，享受中国最好的教育资源，获得丰富的学习机会，拥有站在同等的起跑线上成为高材生的无限可能。

文章来源：腾讯创业

既懂数学又会编程，全方位解读如何成为一名投行Quant工

2016-04-12 UniCareer

Quant是做什么的？

Quant的工作就是设计并实现金融的数学模型（主要采用计算机编程），包括衍生物定价，风险估价或预测市场行为等。所以Quant更多可看为工程师，按中国的习惯性分类方法就是理工类人才，而不是文科人才，这个和金融有一定的区别(当然金融也有很多理工的内容)。

有哪几种 Quant？

(1) Desk Quant

Desk Quant 开发直接被交易员使用的价格模型，优势是接近交易中所遇到的Money和机会，劣势是压力很大。

(2) Model Validating Quant

Model Validating Quant 独立开发价格模型，不过是为了确定Desk Quant开发的模型的正确性。优势是更轻松，压力比较小，劣势是这种小组会比较没有作为而且远离Money。

(3) Research Quant

Research Quant 尝试发明新的价格公式和模型，有时还会执行Blue-Sky Research(不太清楚是什么)，优势是比较有趣(对喜欢这些人来说)，而且你学到很多东西。劣势是有时会比较难证明有你这个人存在(跟科学家一样，没有什么大的成果就没人注意你)

(4) Quant Developer

其实就是名字被美化的程序员，但收入很不错而且很容易找到工作。这种工作变化很大，它可能是一直在写代码，或者调试其他人的大型系统。

(5) Statistical Arbitrage Quant

Statistical Arbitrage Quant 在数据中寻找自动交易系统的模式(就是套利系统)，这种技术比起衍生物定价的技术有很大的不同，它主要用在对冲基金里，而且这种位置的回报是极不稳定的。

(6) Capital Quant

Capital Quant 建立银行的信用和资本模型，相比衍生物定价相关的工作，它没有那么吸引人，但是随着巴塞尔II银行协议的到来，它变的越来越重要，你会得到不错的收入(但不会很多)，更少的压力和更少的工作时间。

人们投资金融行业就是为了赚钱，如果你想获得更多的收入，你就要更靠近那些钱的”生产”的地方，这会产生一种接近钱的看不起那些离得比较远的人的现象，作为一个基本原则，靠近钱比远离钱要来得容易。

Quant工作的领域？

(1) FX

FX就是外汇交易的简写。合同趋向于短期，大量的金额和简单的规定，所以重点在于很快速度的建立模型。

(2) Equities

Equities的意思是股票和指数的期权，技术偏向于偏微分方程(PDE)。它并不是一个特别大的市场。

(3) Fixed Income

Fixed Income的意思是基于利息的衍生物，这从市值上来说可能是最大的市场，他用到的数学会更加复杂因为从根本上来说他是多维的，技术上的技巧会用的很多，他的收入比较高。

(4) Credit Derivatives

Credit Derivatives是建立在那些公司债务还清上的衍生产品，他发展的非常快并有大量需求，所以也有很高的收入，尽管如此，他表明了一些当前经济的泡沫因素。

(5) Commodities

Commodities因为最近几年生活用品价格的普遍涨价，也成为一个发展迅速的领域。

(6) Hybrids

Hybrids是多于一个市场的衍生物市场，典型情况是利息率加上一些其它东西，它主要的优势在于可以学到多种领域的知识，这也是当前非常流行的领域。

Quant一般在哪些公司工作？

(1) 商业银行 (HSBC, RBS)

商业银行对你要求少，也给的少，工作会比较稳定。

(2) 投行 (高盛, Lehman Brothers)

投行需要大量的工作时间但工资很高，不是很稳定的工作，总的来说，美国的银行收入比欧洲银行高，但工作时间更长。

(3) 对冲基金 (Citadel Group)

对冲基金需要大量的工作时间和内容，他们也处在高速发展同时不稳定的情况中，你可能会得到大量的回报，也可能几个月后就被开除。

(4) 会计公司

大型会计公司会有自己的顾问quant团队，有些还会送他们的员工去Oxford读Master，主要的劣势在于你远离具体的行为和决策，而且厉害的人更愿意去银行，所以你比较难找到人请教。

(5) 软件公司

外包quant模型变得越来越流行，所以你去软件公司也是一个选择，劣势和会计公司比较类似。

成为一个Quant需要看哪些书？

UniCareer诚意推荐书单

《Options Future and Other Derivatives》

John C. Hull

不管是找工作还是senior quant都会用到。John Hull本人也是非常厉害的，各个方面都有开创性的成果。现在Toronto Uni，经典中的经典，涉猎还算广泛，不过不够数学—-人称华尔街的圣经，自然不算很难。

《Stochastic Calculus for Finance II》

Steven E. Shreve

Shreve的新书，非常elegant，非常仔细，非常数学完备，适合数学背景，但是比较厚，对于入门来说还是3好。作者现在CMU纽约。教授。顶尖人物。I是讲离散模型，II讲连续模型。

《Liar’s Poker》

Michael Lewis

讲以前Solomon brothers的Arb team的，当时是世界最厉害的quant trader。这本书搞trading的人都会看。

《C++ Design Patterns and Derivatives Pricing》

Mark S. Joshi

对于懂得C++基础的人来说很重要，更重要的是教你学会Monte Carlo。

《Modeling Derivatives in C++ (Wiley Finance)》

Justin London

学习了一年的金工，其实就这本书最核心最实用，其他的理论书看看就好。很多理论书还有重复部分，注意区分。

《The Concepts and Practice of Mathematical Finance》

Mark S. Joshi

这本书的目标在于覆盖一个优秀quant应该知道的知识领域，其中包括强列推荐你在应聘工作之前看的一些编程项目。

《Interest Rate Models – Theory and Practice》

Damiano Brigo / Fabio Mercurio

评价超高的书。这本书最大的精华是关于Libor market model的论述。本书的特点是作者将所有细节和盘托出，包括大量的数值结果，这样可以方便读者自学和验证。

《Probability with Martingales》

David Williams

主要是围绕martingale展开的，前面一部份介绍必要的measure theory的部分，点到即止，都是后面基本的probability theory需要用到的。即使你之前不懂measure theory也能看懂。难怪是给undergraduate用的。Williams是这个方向上文笔最好的数学家了。

《Monte Carlo Methods in Financial Engineering》

Paul Glasserman

本书很实用，紧扣标题，就是围绕着金融工程中蒙特卡洛的应用展开，真正读过的人可能会有感受，此书不太适合作为first book来读，最好两方面都已经有所涉及，再来读收获更大也更舒服些。

《My Life as a Quant: Reflections on Physics and Finance》

Emanuel Derman

作者是第一代quant，以前是GS的quant 研究部门head，现在哥大。是stochastic vol领域顶尖人物，其实也是很多其他领域顶尖人物。

福利领取方式

1. 关注公众号：UniCareer2. 回复关键字：我爱读书

3. 按照提示完成操作

3个工作日内，Quant十本书福利就会送到你的邮箱啦~

成为一个Quant需要知道一些什么？

根据你想工作的地方不同，你需要学习的知识变化很大，在写着篇文章的时间(1996)，我会建议将我的书全部学会就可以了。很多人错误的把学习这些知识看作仅仅看书而已，你要做的是真正的学习，就像你在准备参加一个基于这些书内容的考试，如果你对能在这个考试里拿A都没有信心的话，就不要去面试任何的工作。

面试官更在乎你对基本知识的了解是否透彻，而不是你懂得多少东西，展示你对这个领域的兴趣也很重要，你需要经常阅读Economist, FT 和Wall Street Journal，面试会问到一些基本微积分或分析的问题，例如Logx的积分是什么。问到类似Black-Scholes公式怎么得出的问题也是很正常的，他们也会问到你的论文相关的问题。

面试同样也是让你选择公司的一个机会，他们喜欢什么样的人，他们关心的是什么之类的答案可以从他们的问题中得出，如果问了很多关于C++语法的问题，那么要小心选择除非那是你想做的工作。一般来说，一个PhD对得到Quant的Offer是必需的。

有一个金融数学的Master学位会让你在银行风险或交易支持方面却不是直接Quant方面的工作，银行业变得越来越需要数学知识，所以那些东西在银行的很多领域都有帮助。

在美国，读了一个PhD之后再读一个Master变得越来越普遍，在UK这依然比较少见。

一般哪些专业对口Quant？

据观察，Quant一般的专业会是数学，物理，金融工程（金融数学）。其实虽然不是特别多，但是还是有一些投行招手Master金工的Quant，一般几个好的FE专业都有去做Quant的硕士生。

编程

所有类型的Quant都在编程方面花费大量时间(多于一半)。尽管如此，开发新的模型本身也是很有趣的一件事，标准的实现方法是用C++。一个想成为quant的人需要学习C++，有些其他地方使用Matlab所以也是一个很有用的技能，但没C++那么重要。VBA也用的很多，但你可以在工作中掌握它。

收入

一个Quant能赚多少？一个没有经验的Quant每年大概会挣到税前60k-100k美元。奖金的话不会太高，但是如果行情好的话，也非常的客观，一般我听说的话，刚入职第一年一般可以拿到一两万刀的奖金。如果你的工资超出这个范围，你要问自己Why？收入会迅速的增长，奖金也是总收入中一个很大的组成部分，不要太在乎开始的工资是多少，而是看重这个工作的发展机会和学习的机会。

工作时间

一个Quant工作的时间变化很大。在RBS我们8:30上班，6pm下班。压力也是变化很大的，一些美国银行希望你工作时间更长。在伦敦有5-6个星期的假期，而在美国2-3个是正常的。

一张图读完全球顶级互联网大佬们的一生：扎克伯格、乔布斯、比尔盖茨、马斯克…

纯粹数学的雪崩效应：庞加莱猜想何以造福了精准医疗？

原创 2016-04-12 顾险峰 赛先生

图1 庞加莱猜想电脑三维模型

顾险峰 (纽约州立大学石溪分校终身教授，清华大学丘成桐数学科学中心访问教授，计算共形几何创始人)

最近英国上议院议员马特瑞德利（Matt Ridley）在《华尔街日报》上撰文《基础科学的迷思》（The Myth of Basic Science）。他认为“科学驱动创新，创新驱动商业”这一说法基本上是错误的，反而是商业驱动了创新，创新驱动了科学，正如科学家被实际需求所驱动，而不是科学家驱动实际需求一样。总之，他认为“科学突破是技术进步的结果，而不是原因”。

瑞德利先生的言论反映了许多人对基础科学的严重误解，会给年轻学子们带来思想混乱和价值观念上的困扰，有必要加以澄清。诚然，商业需求和工程实践会为基础科学提供研究的素材，比如历史上最优传输理论（OptimalMass Transportation Theory）和蒙日-安培方程（Monge-Ampere）起源于土石方的运输，最后猜想被康塔洛维奇解决，康塔洛维奇为此获得了诺贝尔经济学奖。数年前，为了解决医学图像的压缩问题，陶哲轩提出了压缩感知（Compressive Sensing）理论。但是，从根本上而言，基础科学的源动力来自于科学家对于自然真理的好奇和对美学价值的追求。基础科学上的突破，因为揭示了自然界的客观真理，往往会引发应用科学的革命。纯粹数学的研究因为其晦涩抽象，实用价值并不明显直观，普罗大众一直倾向于认为其“无用”。但实际上，纯粹数学对应用科学的指导作用是无可替代的。

计算机科学和技术发展的一个侧面就在于将人类数千年积累的知识转换成算法，使得没有经历过职业训练的人也可以直接使用最为艰深的数学理论。在拓扑和几何领域，往往很多具有数百年历史的定理仅仅在最近才被转换成算法。但是，依随计算机技术的迅猛发展，从定理到算法的过程日益加速。很多新近发展的数学理论被迅速转换成强有力的算法，并在工程和医疗领域被广泛应用。

历史一再表明，以满足人类好奇心为出发点的基础理论研究，其本质突破往往不能引起当时人类社会的重视，宛若冰川旷谷中一声微弱的呐喊，转瞬间随风消逝，但是这一声往往会引发令天空变色，大地颤抖的雪崩。庞加莱猜想的证明就是一个鲜明的实例，虽然雪崩效应还没有被大众所察觉，但是雪崩已经不可逆转地开始了！

1 庞加莱猜想

法国数学家庞加莱（Jules Henri Poincaré）是现代拓扑学的奠基人。拓扑学研究几何体，例如流形，在连续形变下的不变性质。我们可以想象曲面由橡皮膜制成，我们对橡皮膜拉伸压缩，扭转蜷曲，但是不会撕破或粘联，那么这些形变都是连续形变，或被称之为拓扑形变，在这些形变下保持不变的量就是拓扑不变量。如果一张橡皮膜曲面经由拓扑形变得到另外一张橡皮膜曲面，则这两张曲面具有相同的拓扑不变量，它们彼此拓扑等价。如图2 所示，假设兔子曲面由橡皮膜做成，我们象吹气球一样将其膨胀成标准单位球面，因此兔子曲面和单位球面拓扑等价。

图2. 兔子曲面可以连续形变成单位球面，因此兔子曲面和球面拓扑等价。

兔子曲面无法连续形变成轮胎的形状，或者图3中的任何曲面。直观上，图5中小猫曲面有一个“洞”，或称“环柄”；图3中的曲面则有两个环柄。拓扑上，环柄被称为亏格。亏格是最为重要的拓扑不变量。所有可定向封闭曲面依照亏格被完全分类。

图3. 亏格为2的封闭曲面。亏格是曲面最重要的拓扑不变量。

庞加莱思考了如下深刻的问题：封闭曲面上的“洞”是曲面自身的内蕴性质，还是曲面及其嵌入的背景空间之间的相对关系？这个问题本身就是费解深奥的，我们力图给出直观浅近的解释。我们人类能够看到环柄形成的“洞”，是因为曲面是嵌入在三维欧式空间中的，因此这些“洞”反应了曲面在背景空间的嵌入方式，我们有理由猜测亏格反映了曲面和背景空间之间的关系。

图4. 曲面上生活的蚂蚁如何检测曲面的拓扑？

但是另一方面，假设有一只蚂蚁自幼生活在一张曲面上，从未跳离过曲面，因此从未看到过曲面的整体情形。蚂蚁只有二维概念，没有三维概念。假设这只蚂蚁具有高度发达的智力，那么这只蚂蚁能否判断它所生活的曲面是个亏格为0的拓扑球面，还是高亏格曲面？

图5. 亏格为1的曲面上，无法缩成点的闭圈。

庞加莱最终悟到一个简单而又深刻的方法来判断曲面是否是亏格为0的拓扑球面：如果曲面上所有的封闭曲线都能在曲面上逐渐缩成一个点，那么曲面必为拓扑球面。比如，我们考虑图5中小猫的曲面，围绕脖子的一条封闭曲线，在曲面上无论怎样变形，都无法缩成一个点。换言之，只要曲面亏格非零，就存在不可收缩成点的闭圈。如果流形内所有的封闭圈都能缩成点，则流形被称为是单连通的。庞加莱将这一结果向高维推广，提出了著名的庞加莱猜想：假设M是一个封闭的单连通三维流形，则M和三维球面拓扑等价。

图6. 带边界的三流形，用三角剖分表示。

在现实世界中，无法看到封闭的三维流形：正如二维封闭曲面无法在二维平面上实现，三维封闭流形无法在三维欧式空间中实现。图6显示了带有边界的三维流形，例如实心的兔子和实心的球体拓扑等价。这些三维流形用三角剖分来表示，就是用许多四面体粘合而成。如图所示，体的三角剖分诱导了其二维边界曲面的三角剖分。实心球实际上是三维拓扑圆盘，我们可以将两个三维拓扑圆盘沿着边界粘合，就得到三维球面，恰如我们可以将两个二维拓扑圆盘沿着边界粘合而得到二维球面一样。当然，这已经超出人们的日常生活经验。

2 曲面单值化定理

近百年来，庞加莱猜想一直是拓扑学最为基本的问题，无数拓扑学家和几何学家为证明庞加莱猜想而鞠躬尽瘁死而后已。相比那些最后成功的幸运儿，众多默默无闻，潦倒终生的失败者更加令人肃然起敬。老顾曾经访问过吉林大学数学学院，听闻了有关何伯和教授的生平事迹。何教授终生痴迷于庞加莱猜想的证明，苦心孤诣，废寝忘食，愈挫愈奋，九死不悔，直至生命终结，对于庞加莱猜想依然念念不忘。何教授绝对不是为了任何实用价值或者商业利益而奋斗终生的，而是为了对于自然界奥秘的好奇，对于美学价值的热切追求，这种纯粹和崇高，是人类进步的真正动力！

图7. 人脸曲面上连接两点的测地线。

作为拓扑学最为基本的问题，庞加莱猜想的本质突破却来自于几何。给定一个拓扑流形，如给定图6中四面体网格的组合结构，我们可以为每条边指定一个长度，使得每个四面体都是一个欧式的四面体，这样我们就给出了一个黎曼度量。所谓黎曼度量，就是定义在流形上的一种数据结构，使得我们可以确定任意两点间的最短测地线。图7显示了人脸曲面上的两条测地线。黎曼度量自然诱导了流形的曲率。曲率是表征空间弯曲的一种精确描述。给定曲面上三个点，我们用测地线连接它们成一个测地三角形。如果曲面为欧几里德平面，那么测地三角形内角和为180度。球面测地三角形的内角和大于180度，马鞍面的测地三角形的内角和小于180度。测地三角形内角和与180度的差别就是三角形的总曲率。那么，给定一个拓扑流形，我们能否选择一个最为简单的黎曼度量，使得曲率为常数呢？

图8. 曲面单值化定理，所有封闭曲面都可以保角地形变成常曲率曲面。

这一问题的答案是肯定的，这就是曲面微分几何中最为根本的单值化定理。单值化定理是说大千世界，各种几何形状千变万化，但是无论它们如何变化，都是万变不离其宗：所有的曲面都可以共形地变换成三种标准曲面中的一种，单位球面，欧几里德平面和双曲平面。标准空间对应着常数值曲率，+1，0和-1，如图8所示。所谓共形变换，就是保持局部形状的变换，局部上看就是相似变换。相似变换保持角度不变，因此共形变换也被称为是保角变换。图9显示了从曲面到平面的一个共形变换。单值化定理断言了所有封闭曲面可以配有三种几何中的一种：球面几何，欧氏几何和双曲几何。曲面微分几何中几乎所有的重要定理都绕不过单值化定理。

图9. 共形变换保持局部形状。

3 瑟斯顿几何化猜想

为了证明庞加莱猜想，菲尔兹奖得主瑟斯顿推广了单值化定理到三维流形情形。任何三维流形，都可以经历一套标准手续分解成一系列的最为简单的三维流形，即所谓的素流形。素流形本身无法被进一步分解，同时这种分解本质上是唯一的。瑟斯顿提出了石破天惊的几何化猜想：所有的素三维流形可以配有标准黎曼度量，从而具有8种几何中的一种。特别地，单连通的三维流形可被配有正的常值曲率度量，配有正的常值曲率的3维流形必为3维球面。因此庞加莱猜想是瑟斯顿几何化猜想的一个特例。

图10. 瑟斯顿的苹果，几何化猜想。

图10显示了瑟斯顿几何化的一个实例。假设我们有一个苹果，三只蛀虫蛀蚀了三条管道，如左帧所示，这样我们得到了一个带边界的三维流形。根据几何化纲领，这个被蛀蚀的苹果内部容许一个双曲黎曼度量，使得其边界曲面的曲率处处为-1。我们将配有双曲度量的苹果周期性地嵌在三维双曲空间之中，得到右帧所示图形。

4 哈密尔顿的里奇曲率

本质的突破来自于哈密尔顿的里奇曲率流（Hamilton’s Ricci Flow）。哈密尔顿的想法来自经典的热力学扩散现象。假设我们有一只铁皮兔子，初始时刻兔子表面的温度分布并不均匀，依随时间流逝，温度渐趋一致，最后在热平衡状态，温度为常数。哈密尔顿设想：如果黎曼度量依随时间变化，度量的变化率和曲率成正比，那么曲率就像温度一样扩散，逐渐变得均匀，直至变成常数。如图11所示，初始的哑铃曲面经由曲率流，曲率变得越来越均匀，最后变成常数，曲面变成了球面。

图11. 曲率流使得曲率越来越均匀，直至变成常数，曲面变成球面。

在二维曲面情形，哈密尔顿和Ben Chow证明了曲率流的确将任何一个黎曼度量形变成常值曲率度量，从而给出了曲面单值化定理的一个构造性证明。但是在三维流形情形，里奇曲率流遇到了巨大的挑战。在二维曲面情形，在曲率流过程中，在任意时刻，曲面上任意一点的曲率都是有限的；在三维流形情形，在有限时间内，流形的某一点处，曲率有可能趋向于无穷，这种情况被称为是曲率爆破（blowup），爆破点被称为是奇异点（singularity）。

如果发生曲率爆破，我们可以将流形沿着爆破点一切两半，然后将每一半接着实施曲率流。如果我们能够证明在曲率流的过程中，曲率爆破发生的次数有限，那么流形被分割成有限个子流形，每个子流形最终变成了三维球面。如果这样，原来流形由有限个球粘合而成，因而是三维球面，这样就证明了庞加莱猜想。由此可见，对于奇异点的精细分析成为问题的关键。哈密尔顿厘清了大多数种类奇异点的情况，佩雷尔曼解决了剩余的奇异点种类。同时，佩雷尔曼敏锐地洞察到哈密尔顿的里奇流是所谓熵能量的梯度流，从而将里奇流纳入了变分的框架。佩雷尔曼给出了证明的关键思想和主要梗概，证明的细节被众多数学家进一步补充完成。至此，瑟斯顿几何化猜想被完全证明，庞加莱猜想历经百年探索，终于被彻底解决。

5 庞加莱猜想带来的计算技术

庞加莱猜想本身异常抽象而枯燥：单连通的闭3-流形是三维球面，似乎没有任何实用价值。但是为了证明庞加莱猜想，人类发展了瑟斯顿几何化纲领，发明了哈密尔顿的里奇曲率流，深刻地理解了三维流形的拓扑和几何，将奇异点的形成过程纳入了数学的视野。这些基础数学上的进展，必将引起工程科学和实用技术领域的“雪崩”。比如，里奇曲率流技术实际上给出了一种强有力的方法，使得我们可以用曲率来构造黎曼度量。

里奇曲率流属于非线性几何偏微分方程，里奇流的方法实际上是典型的几何分析方法，即用偏微分方程的技术来证明几何问题。几何分析由丘成桐先生创立，庞加莱猜想的证明是几何分析的又一巨大胜利。当年瑟斯顿提倡用相对传统的拓扑和几何方法，例如泰西米勒理论和双曲几何理论来证明，也有数学家主张用相对组合的方法来证明，最终还是几何分析的方法拔得头筹。

哈密尔顿的里奇流是定义在光滑流形上的，在计算机的表示中，所有的流形都被离散化。因此，我们需要建立一套离散里奇流理论来发展相应的计算方法。历经多年的努力，笔者和合作者们建立了离散曲面的里奇曲率流理论，证明了离散解的存在性和唯一性。因为几乎所有曲面微分几何的重要问题，都无法绕过单值化定理。我们相信离散曲率流的计算方法必将在工程实践中发挥越来越重要的作用 ^[1]。

图12. 离散里奇流计算的带边曲面单值化。

图8和图12显示了离散里奇流算出的封闭曲面和带边界曲面的单值化。本质上，这两幅图统摄了现实生活中所有可能的曲面，它们都被共形地映到了三种标准曲面上，球面、欧氏平面和双曲平面。这意味着，如果我们发明了一种新的几何算法，适用于这三种标准曲面，那么这一算法也适用于所有曲面。因此，离散曲率流的技术极大地简化了几何算法设计。

6 精准医疗

庞加莱猜想所诱发的离散曲率流方法被广泛应用于精准医疗领域。人体的各种器官本质上都是二维曲面或三维流形，曲率流方法对于这些器官几何特征的分析和比较起到了不可替代的作用。

图13. 虚拟肠镜技术。

虚拟肠镜

直肠癌是男子的第四号杀手，仅在心脑血管疾病之后。中年之后，每个人都会自然长出直肠息肉，息肉会逐年增长，如果息肉直径达到一定尺寸，由于摩擦息肉会发生溃疡，长期溃疡会导致癌变。但是直肠息肉的生长非常缓慢，一般从息肉出现直到临界尺寸需要七八年，因此对息肉的监控对于预防直肠癌起着至关重要的作用。中年人应该每两年做一次肠镜检查。传统的肠镜检查方法需要对受检者全身麻醉，将光学内窥镜探入直肠。老年人肠壁比较薄弱，容易产生并发症。同时，肠壁上有很多皱褶，如果息肉隐藏在皱褶中，医生会无法看到而产生漏检。

近些年来，北美和日本采用了虚拟肠镜技术。受检者的直肠图像由断层扫描技术来获取，直肠曲面可以从图像重建出来，如图14所示。我们需要将直肠展开摊平，从而使所有皱褶暴露出来，以便于寻找息肉和测量它们的尺寸。同时，如图13所示，在检查中同一受检者的直肠被扫描两次，每次扫描都是采用不同的姿态。直肠曲面柔软而富有弹性，不同的扫描得到的曲面之间相差很大的弹性形变。我们需要在两张曲面间建立光滑双射。在两张三维曲面间建立映射相对困难，当我们将曲面摊开展平成平面长方形后，在平面区域间计算映射会简单很多。将直肠曲面摊开展平等价于为曲面赋上一个曲率处处为0的黎曼度量，我们可以直接应用里奇曲率流的算法加以实现，如图14所示。

图14. 用里奇曲率流将直肠曲面摊开展平。

目前，虚拟肠镜技术在北美和日本被广泛采用，（但在中国还没有普及），主要是因为这种方法可以提高安全性，降低漏检率，降低人力成本。虚拟肠镜技术的普及极大地提高了早期直肠癌的发现几率，降低了直肠癌的死亡率，为人类的健康事业做出了巨大贡献。

图15. 虚拟膀胱镜。

同样的方法可以用于膀胱等其他器官，如图15所示。膀胱癌的最主要特征是膀胱壁变厚，同时内壁不再光滑，出现菜花状的几何纹理。这些症状可以用虚拟膀胱镜的方法定量得到。传统膀胱镜的方法病人需要忍受很大的痛苦，虚拟膀胱镜的方法极大地减轻了病患的疼痛，因而具有很大的优势。

图16. 用里奇曲率流将大脑皮层曲面共形映到单位球面，以便于对照比较。

脑神经疾病的预防诊断

脑退化症（Alzheimer’s disease，俗称老年痴呆症），癫痫，儿童自闭症等脑神经疾病严重地威胁着人类的健康安全。对于这些疾病的预防和诊断具有重要的现实意义。通过核磁共振成像技术，我们能够获取人类的大脑皮层曲面，如图16所示。大脑皮层曲面的几何非常复杂，有大量的皱褶沟回结构，并且这些几何结构因人而异，依随年龄变化而变化。例如老年痴呆症往往伴随大脑皮层一部分区域的萎缩。为了监控病情的发展，我们需要每隔几个月就扫描一下病人的大脑，然后将不同时期得到的大脑皮层曲面进行精确地对比。在三维空间中直接对比难度很高，我们非常容易将不同的沟回错误地对齐，算法落入在局部最优陷阱中。如图16所示，我们将大脑皮层曲面共形地映到球面上，然后在球面之间建立光滑映射，这种方法更加简单而精确。将大脑皮层映到球面等价于为大脑皮层曲面赋以曲率为+1的黎曼度量，我们可以用里奇曲率流的方法得到。

图17. 大脑海马体的几何分析。

如果说大脑皮层是数据库，那么海马体就是数据库的索引，如图17所示。如果海马体发生病变，长期记忆就无法形成，同时大脑中的长期记忆也无法被取出。很多神经疾病都能够引起海马体的变形，例如癫痫、吸毒、脑退化症等等。对海马体的几何形状进行定量比较分类是非常重要的。一种精确的方法是将海马体共形映到单位球面上，则面积的变化率给出了初始黎曼度量的全部信息，再加上平均曲率，那么海马体的所有几何信息被完美保留。换言之，我们将海马体曲面转换为球面上的两个函数（面积变化率，平均曲率）。在球面上比较不同的海马体曲面，从而精确衡量曲面之间的相似程度，进行分类诊断。相比于传统方法，这种基于几何的诊断方法更加定量而精确。

图18. 人脸曲面的精确匹配。

美容技术

在美容手术领域中，术后效果评估是重要的一个环节，这需要将术前和术后的人脸曲面进行精确的匹配。如图18所示，我们扫描了同一个人的带有不同表情的两张人脸曲面，然后在人脸曲面之间建立了精确的双射。平静表情的人脸上每一个小圆映到微笑表情的人脸上对应的小椭圆，由此我们可以测量对应点的几何变化。因此，三维人脸曲面间精确映射是美容领域中至关重要的技术。

图19. 三维人脸曲面被共形地映到二维平面上，所用方法就是里奇曲率流。

如图19所示，我们用里奇曲率流方法，将人脸曲面的黎曼度量变成曲率为0的平直度量，将三维人脸曲面平铺到二维平面上面，然后在二维平面区域之间建立光滑双射，从而诱导三维人脸曲面间的双射。当然，这种方法也可以用于三维人脸识别，但是人脸识别对于映射的精确度要求没有如此之高。

在精准医疗的其他领域，例如牙齿整形、人造心脏瓣膜、人造骨骼、放射治疗实时监控、肝脏手术计划等等，都需要对各种人体器官进行影像获取、几何重建、特征分析等，里奇流方法都会起到重要的作用。

7 总结和展望

庞加莱猜想本身纯粹而抽象：单连通的闭三维流形是三维球面，这一猜想本身似乎并没有任何实用价值。其结论的简单直观，往往给非数学专业人员无病呻吟之感。但是纯粹数学所追求的严密性迫使无数拓扑和几何学家们前仆后继，奉献终身，终于在众多数学家的共同努力下完成了证明。二维曲面的几何化定理——单值化定理从理论证明到算法提出，历经了百年；三维流形的瑟斯顿几何化纲领从理论证明到算法提出，几乎是同时。三维流形的拓扑理论和计算理论一开始就深刻地纠缠在一起。这表明，在现代，依随计算机技术的发展，纯粹理论到应用算法的开发周期越来越短。

同时，我们看到，在证明庞加莱猜想的过程中，瑟斯顿的几何化纲领将三维流形的风景一览无遗，哈密尔顿的里奇流给出从曲率来构造黎曼度量的强有力的工具，哈密尔顿和佩雷尔曼的奇点演化理论使得原来理论的禁区被彻底打破。笔者和许多数学家发展了离散里奇流的理论和算法，并且系统性地将曲率流应用到许多工程和医疗领域。在实践中，我们深深体会到，在许多关键的应用中，曲率流的方法无法被其它任何方法所取代。目前在社会实践中，里奇流在二维曲面上的应用已经开始逐步展开。但是里奇流在三维流形上的应用更为深邃奥妙，强悍有力，目前还没有任何实际应用。一方面因为三维流形远远超越日常生活经验，另一方面也是因为和曲面微分几何相比，三维流形的拓扑和几何知识远未普及。但是作为自然真理的忠实刻画，迟早三维流形的拓扑和几何会在社会实践中大行其道。庞加莱猜想所引发的雪崩效应终究会改写历史进程。

当庞加莱提出他的拓扑猜想，瑟斯顿洞察三维流形的基本几何结构，哈密尔顿悟出里奇曲率流，佩雷尔曼看出哈密尔顿的曲率流本质上是所谓熵能量的梯度流，他们所追求的是体悟几何结构的壮美，和自然真理的深邃。他们绝不会将实用价值作为其终极目的。实用技术的积累往往只能带来进化（Evolutio），好奇心的满足却能真正带来革命（Revolution）。愿更多的年轻人在万丈红尘中，在浮躁喧嚣中，能够保持诚挚纯真，保持强烈好奇，保持对自然界美丽的敏感，保持对科学真理的激情！

（如果读者对于离散曲面里奇流的理论、算法和应用有兴趣，可以进一步查阅专著【1】）

参考文献

[1] W. Zeng and X. Gu, Ricci Flow for Shape Analysis and Surface Registration Theories, Algorithms and Applications, Springer 2013

Essentials of Machine Learning Algorithms (with Python and R Codes)

Source: http://www.analyticsvidhya.com/blog/2015/08/common-machine-learning-algorithms/

Introduction

Google’s self-driving cars and robots get a lot of press, but the company’s real future is in machine learning, the technology that enables computers to get smarter and more personal.

– Eric Schmidt (Google Chairman)

We are probably living in the most defining period of human history. The period when computing moved from large mainframes to PCs to cloud. But what makes it defining is not what has happened, but what is coming our way in years to come.

What makes this period exciting for some one like me is the democratization of the tools and techniques, which followed the boost in computing. Today, as a data scientist, I can build data crunching machines with complex algorithms for a few dollors per hour. But, reaching here wasn’t easy! I had my dark days and nights.

Who can benefit the most from this guide?

What I am giving out today is probably the most valuable guide, I have ever created.

The idea behind creating this guide is to simplify the journey of aspiring data scientists and machine learning enthusiasts across the world. Through this guide, I will enable you to work on machine learning problems and gain from experience. I am providing a high level understanding about various machine learning algorithms along with R & Python codes to run them. These should be sufficient to get your hands dirty.

I have deliberately skipped the statistics behind these techniques, as you don’t need to understand them at the start. So, if you are looking for statistical understanding of these algorithms, you should look elsewhere. But, if you are looking to equip yourself to start building machine learning project, you are in for a treat.

Broadly, there are 3 types of Machine Learning Algorithms..

1. Supervised Learning

How it works: This algorithm consist of a target / outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using these set of variables, we generate a function that map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data. Examples of Supervised Learning: Regression, Decision Tree, Random Forest, KNN, Logistic Regression etc.

2. Unsupervised Learning

How it works: In this algorithm, we do not have any target or outcome variable to predict / estimate. It is used for clustering population in different groups, which is widely used for segmenting customers in different groups for specific intervention. Examples of Unsupervised Learning: Apriori algorithm, K-means.

3. Reinforcement Learning:

How it works: Using this algorithm, the machine is trained to make specific decisions. It works this way: the machine is exposed to an environment where it trains itself continually using trial and error. This machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions. Example of Reinforcement Learning: Markov Decision Process

List of Common Machine Learning Algorithms

Here is the list of commonly used machine learning algorithms. These algorithms can be applied to almost any data problem:

Linear Regression
Logistic Regression
Decision Tree
SVM
Naive Bayes
KNN
K-Means
Random Forest
Dimensionality Reduction Algorithms
Gradient Boost & Adaboost

1. Linear Regression

It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s). Here, we establish relationship between independent and dependent variables by fitting a best line. This best fit line is known as regression line and represented by a linear equation Y= a *X + b.

The best way to understand linear regression is to relive this experience of childhood. Let us say, you ask a child in fifth grade to arrange people in his class by increasing order of weight, without asking them their weights! What do you think the child will do? He / she would likely look (visually analyze) at the height and build of people and arrange them using a combination of these visible parameters. This is linear regression in real life! The child has actually figured out that height and build would be correlated to the weight by a relationship, which looks like the equation above.

In this equation:

Y – Dependent Variable
a – Slope
X – Independent variable
b – Intercept

These coefficients a and b are derived based on minimizing the sum of squared difference of distance between data points and regression line.

Look at the below example. Here we have identified the best fit line having linear equation y=0.2811x+13.9. Now using this equation, we can find the weight, knowing the height of a person.

Linear Regression is of mainly two types: Simple Linear Regression and Multiple Linear Regression. Simple Linear Regression is characterized by one independent variable. And, Multiple Linear Regression(as the name suggests) is characterized by multiple (more than 1) independent variables. While finding best fit line, you can fit a polynomial or curvilinear regression. And these are known as polynomial or curvilinear regression.

Python Code

#Import Library
#Import other necessary libraries like pandas, numpy...
from sklearn import linear_model
#Load Train and Test datasets
#Identify feature and response variable(s) and values must be numeric and numpy arrays
x_train=input_variables_values_training_datasets
y_train=target_variables_values_training_datasets
x_test=input_variables_values_test_datasets
# Create linear regression object
linear = linear_model.LinearRegression()
# Train the model using the training sets and check score
linear.fit(x_train, y_train)
linear.score(x_train, y_train)
#Equation coefficient and Intercept
print('Coefficient: \n', linear.coef_)
print('Intercept: \n', linear.intercept_)
#Predict Output
predicted= linear.predict(x_test)

R Code

#Load Train and Test datasets
#Identify feature and response variable(s) and values must be numeric and numpy arrays
x_train # Train the model using the training sets and check score
linear <- lm(y_train ~ ., data = x)
summary(linear)
#Predict Output
predicted= predict(linear,x_test)

2. Logistic Regression

Don’t get confused by its name! It is a classification not a regression algorithm. It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false ) based on given set of independent variable(s). In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function. Hence, it is also known as logit regression. Since, it predicts the probability, its output values lies between 0 and 1 (as expected).

Again, let us try and understand this through a simple example.

Let’s say your friend gives you a puzzle to solve. There are only 2 outcome scenarios – either you solve it or you don’t. Now imagine, that you are being given wide range of puzzles / quizzes in an attempt to understand which subjects you are good at. The outcome to this study would be something like this – if you are given a trignometry based tenth grade problem, you are 70% likely to solve it. On the other hand, if it is grade fifth history question, the probability of getting an answer is only 30%. This is what Logistic Regression provides you.

Coming to the math, the log odds of the outcome is modeled as a linear combination of the predictor variables.

odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk

Above, p is the probability of presence of the characteristic of interest. It chooses parameters that maximize the likelihood of observing the sample values rather than that minimize the sum of squared errors (like in ordinary regression).

Now, you may ask, why take a log? For the sake of simplicity, let’s just say that this is one of the best mathematical way to replicate a step function. I can go in more details, but that will beat the purpose of this article.

Python Code

#Import Library
from sklearn.linear_model import LogisticRegression
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create logistic regression object
model = LogisticRegression()
# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)
#Equation coefficient and Intercept
print('Coefficient: \n', model.coef_)
print('Intercept: \n', model.intercept_)
#Predict Output
predicted= model.predict(x_test)

R Code

x # Train the model using the training sets and check score
logistic <- glm(y_train ~ ., data = x,family='binomial')
summary(logistic)
#Predict Output
predicted= predict(logistic,x_test)

Furthermore..

There are many different steps that could be tried in order to improve the model:

including interaction terms
removing features
regularization techniques
using a non-linear model

3. Decision Tree

This is one of my favorite algorithm and I use it quite frequently. It is a type of supervised learning algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical and continuous dependent variables. In this algorithm, we split the population into two or more homogeneous sets. This is done based on most significant attributes/ independent variables to make as distinct groups as possible. For more details, you can read: Decision Tree Simplified.

source: statsexchange

In the image above, you can see that population is classified into four different groups based on multiple attributes to identify ‘if they will play or not’. To split the population into different heterogeneous groups, it uses various techniques like Gini, Information Gain, Chi-square, entropy.

The best way to understand how decision tree works, is to play Jezzball – a classic game from Microsoft (image below). Essentially, you have a room with moving walls and you need to create walls such that maximum area gets cleared off with out the balls.

So, every time you split the room with a wall, you are trying to create 2 different populations with in the same room. Decision trees work in very similar fashion by dividing a population in as different groups as possible.

More: Simplified Version of Decision Tree Algorithms

Python Code

#Import Library
#Import other necessary libraries like pandas, numpy...
from sklearn import tree
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create tree object 
model = tree.DecisionTreeClassifier(criterion='gini') # for classification, here you can change the algorithm as gini or entropy (information gain) by default it is gini  
# model = tree.DecisionTreeRegressor() for regression
# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)
#Predict Output
predicted= model.predict(x_test)

R Code

library(rpart)
x # grow tree 
fit y_train ~ ., data = x,method="class")
summary(fit)
#Predict Output 
predicted= predict(fit,x_test)

4. SVM (Support Vector Machine)

It is a classification method. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate.

For example, if we only had two features like Height and Hair length of an individual, we’d first plot these two variables in two dimensional space where each point has two co-ordinates (these co-ordinates are known as Support Vectors)

Now, we will find some line that splits the data between the two differently classified groups of data. This will be the line such that the distances from the closest point in each of the two groups will be farthest away.

In the example shown above, the line which splits the data into two differently classified groups is the black line, since the two closest points are the farthest apart from the line. This line is our classifier. Then, depending on where the testing data lands on either side of the line, that’s what class we can classify the new data as.

More: Simplified Version of Support Vector Machine

Think of this algorithm as playing JezzBall in n-dimensional space. The tweaks in the game are:

You can draw lines / planes at any angles (rather than just horizontal or vertical as in classic game)
The objective of the game is to segregate balls of different colors in different rooms.
And the balls are not moving.

Python Code

#Import Library
from sklearn import svm
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create SVM classification object 
model = svm.svc() # there is various option associated with it, this is simple for classification. You can refer link, for mo# re detail.
# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)
#Predict Output
predicted= model.predict(x_test)

R Code

library(e1071)
x # Fitting model
fit <-svm(y_train ~ ., data = x)
summary(fit)
#Predict Output 
predicted= predict(fit,x_test)

5. Naive Bayes

It is a classification technique based on Bayes’ theorem with an assumption of independence between predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier would consider all of these properties to independently contribute to the probability that this fruit is an apple.

Naive Bayesian model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below:

Here,

P(c|x) is the posterior probability of class (target) given predictor (attribute).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.

Example: Let’s understand it using an example. Below I have a training data set of weather and corresponding target variable ‘Play’. Now, we need to classify whether players will play or not based on weather condition. Let’s follow the below steps to perform it.

Step 1: Convert the data set to frequency table

Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64.

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of prediction.

Problem: Players will pay if weather is sunny, is this statement is correct?

We can solve it using above discussed method, so P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64

Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.

Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.

Python Code

#Import Library
from sklearn.naive_bayes import GaussianNB
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create SVM classification object model = GaussianNB() # there is other distribution for multinomial classes like Bernoulli Naive Bayes, Refer link
# Train the model using the training sets and check score
model.fit(X, y)
#Predict Output
predicted= model.predict(x_test)

R Code

library(e1071)
x # Fitting model
fit <-naiveBayes(y_train ~ ., data = x)
summary(fit)
#Predict Output 
predicted= predict(fit,x_test)

6. KNN (K- Nearest Neighbors)

It can be used for both classification and regression problems. However, it is more widely used in classification problems in the industry. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors. The case being assigned to the class is most common amongst its K nearest neighbors measured by a distance function.

These distance functions can be Euclidean, Manhattan, Minkowski and Hamming distance. First three functions are used for continuous function and fourth one (Hamming) for categorical variables. If K = 1, then the case is simply assigned to the class of its nearest neighbor. At times, choosing K turns out to be a challenge while performing KNN modeling.

More: Introduction to k-nearest neighbors : Simplified.

KNN can easily be mapped to our real lives. If you want to learn about a person, of whom you have no information, you might like to find out about his close friends and the circles he moves in and gain access to his/her information!

Things to consider before selecting KNN:

KNN is computationally expensive
Variables should be normalized else higher range variables can bias it
Works on pre-processing stage more before going for KNN like outlier, noise removal

Python Code

#Import Library
from sklearn.neighbors import KNeighborsClassifier
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create KNeighbors classifier object model 
KNeighborsClassifier(n_neighbors=6) # default value for n_neighbors is 5
# Train the model using the training sets and check score
model.fit(X, y)
#Predict Output
predicted= model.predict(x_test)

R Code

library(knn)
x # Fitting model
fit <-knn(y_train ~ ., data = x,k=5)
summary(fit)
#Predict Output 
predicted= predict(fit,x_test)

7. K-Means

It is a type of unsupervised algorithm which solves the clustering problem. Its procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters). Data points inside a cluster are homogeneous and heterogeneous to peer groups.

Remember figuring out shapes from ink blots? k means is somewhat similar this activity. You look at the shape and spread to decipher how many different clusters / population are present!

How K-means forms cluster:

K-means picks k number of points for each cluster known as centroids.
Each data point forms a cluster with the closest centroids i.e. k clusters.
Finds the centroid of each cluster based on existing cluster members. Here we have new centroids.
As we have new centroids, repeat step 2 and 3. Find the closest distance for each data point from new centroids and get associated with new k-clusters. Repeat this process until convergence occurs i.e. centroids does not change.

How to determine value of K:

In K-means, we have clusters and each cluster has its own centroid. Sum of square of difference between centroid and the data points within a cluster constitutes within sum of square value for that cluster. Also, when the sum of square values for all the clusters are added, it becomes total within sum of square value for the cluster solution.

We know that as the number of cluster increases, this value keeps on decreasing but if you plot the result you may see that the sum of squared distance decreases sharply up to some value of k, and then much more slowly after that. Here, we can find the optimum number of cluster.

Python Code

#Import Library
from sklearn.cluster import KMeans
#Assumed you have, X (attributes) for training data set and x_test(attributes) of test_dataset
# Create KNeighbors classifier object model 
k_means = KMeans(n_clusters=3, random_state=0)
# Train the model using the training sets and check score
model.fit(X)
#Predict Output
predicted= model.predict(x_test)

R Code

library(cluster)
fit

8. Random Forest

Random Forest is a trademark term for an ensemble of decision trees. In Random Forest, we’ve collection of decision trees (so known as “Forest”). To classify a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest).

Each tree is planted & grown as follows:

If the number of cases in the training set is N, then sample of N cases is taken at random but with replacement. This sample will be the training set for growing the tree.
If there are M input variables, a number m<
Each tree is grown to the largest extent possible. There is no pruning.

For more details on this algorithm, comparing with decision tree and tuning model parameters, I would suggest you to read these articles:

Python

#Import Library
from sklearn.ensemble import RandomForestClassifier
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create Random Forest object
model= RandomForestClassifier()
# Train the model using the training sets and check score
model.fit(X, y)
#Predict Output
predicted= model.predict(x_test)

R Code

library(randomForest)
x # Fitting model
fit summary(fit)
#Predict Output 
predicted= predict(fit,x_test)

9. Dimensionality Reduction Algorithms

In the last 4-5 years, there has been an exponential increase in data capturing at every possible stages. Corporates/ Government Agencies/ Research organisations are not only coming with new sources but also they are capturing data in great detail.

For example: E-commerce companies are capturing more details about customer like their demographics, web crawling history, what they like or dislike, purchase history, feedback and many others to give them personalized attention more than your nearest grocery shopkeeper.

As a data scientist, the data we are offered also consist of many features, this sounds good for building good robust model but there is a challenge. How’d you identify highly significant variable(s) out 1000 or 2000? In such cases, dimensionality reduction algorithm helps us along with various other algorithms like Decision Tree, Random Forest, PCA, Factor Analysis, Identify based on correlation matrix, missing value ratio and others.

To know more about this algorithms, you can read “Beginners Guide To Learn Dimension Reduction Techniques“.

Python Code

#Import Library
from sklearn import decomposition
#Assumed you have training and test data set as train and test
# Create PCA obeject pca= decomposition.PCA(n_components=k) #default value of k =min(n_sample, n_features)
# For Factor analysis
#fa= decomposition.FactorAnalysis()
# Reduced the dimension of training dataset using PCA
train_reduced = pca.fit_transform(train)
#Reduced the dimension of test dataset
test_reduced = pca.transform(test)
#For more detail on this, please refer  this link.

R Code

library(stats)
pca train, cor = TRUE)
train_reduced  train)
test_reduced  test)

10. Gradient Boosting & AdaBoost

GBM & AdaBoost are boosting algorithms used when we deal with plenty of data to make a prediction with high prediction power. Boosting is an ensemble learning algorithm which combines the prediction of several base estimators in order to improve robustness over a single estimator. It combines multiple weak or average predictors to a build strong predictor. These boosting algorithms always work well in data science competitions like Kaggle, AV Hackathon, CrowdAnalytix.

More: Know about Gradient and AdaBoost in detail

Python Code

#Import Library
from sklearn.ensemble import GradientBoostingClassifier
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create Gradient Boosting Classifier object
model= GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0)
# Train the model using the training sets and check score
model.fit(X, y)
#Predict Output
predicted= model.predict(x_test)

R Code

library(caret)
x # Fitting model
fitControl predicted= predict(fit,x_test,type= "prob")[,2]

GradientBoostingClassifier and Random Forest are two different boosting tree classifier and often people ask about the difference between these two algorithms.

End Notes

By now, I am sure, you would have an idea of commonly used machine learning algorithms. My sole intention behind writing this article and providing the codes in R and Python is to get you started right away. If you are keen to master machine learning, start right away. Take up problems, develop a physical understanding of the process, apply these codes and see the fun!

Did you find this article useful ? Share your views and opinions in the comments section below.

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

10 Drupal Fundamentals

Some lesser-known truths about programming

David Veksler

Source: http://automagical.rationalmind.net/2010/08/17/some-lesser-known-truths-about-programming/

My experience as a programmer has taught me a few things about writing software. Here are some things that people might find surprising about writing code:

Averaging over the lifetime of the project, a programmer spends about 10-20% of his time writing code, and most programmers write about 10-12 lines of code per day that goes into the final product, regardless of their skill level. Good programmers spend much of the other 90% thinking, researching, and experimenting to find the best design. Bad programmers spend much of that 90% debugging code by randomly making changes and seeing if they work.
A good programmer is ten times more productive than an average programmer. A great programmer is 20-100 times more productive than the average. This is not an exaggeration – studies since the 1960’s have consistently shown this. A bad programmer is not just unproductive – he will not only not get any work done, but create a lot of work and headaches for others to fix.“A great lathe operator commands several times the wage of an average lathe operator, but a great writer of software code is worth 10,000 times the price of an average software writer.” –Bill Gates
Great programmers spend little of their time writing code – at least code that ends up in the final product. Programmers who spend much of their time writing code are too lazy, too ignorant, or too arrogant to find existing solutions to old problems. Great programmers are masters at recognizing and reusing common patterns. Good programmers are not afraid to refactor (rewrite) their code to reach the ideal design. Bad programmers write code which lacks conceptual integrity, non-redundancy, hierarchy, and patterns, and so is very difficult to refactor. It’s easier to throw away bad code and start over than to change it.
Software development obeys the laws of entropy, like any other process. Continuous change leads to software rot, which erodes the conceptual integrity of the original design. Software rot is unavoidable, but programmers who fail to take conceptual integrity into consideration create software that rots so so fast that it becomes worthless before it is even completed. Entropic failure of conceptual integrity is probably the most common reason for software project failure. (The second most common reason is delivering something other than what the customer wanted.) Software rot slows down progress exponentially, so many projects face exploding timelines and budgets before they are mercifully killed.
A 2004 study found that most software projects (51%) will fail in a critical aspect, and 15% will fail totally. This is an improvement since 1994, when 31% failed.
Although most software is made by teams, it is not a democratic activity. Usually, just one person is responsible for the design, and the rest of the team fills in the details.
Programming is hard work. It’s an intense mental activity. Good programmers think about their work 24/7. They write their most important code in the shower and in their dreams. Because the most important work is done away from a keyboard, software projects cannot be accelerated by spending more time in the office or adding more people to a project.

前端工程师是怎样一种职业

2016-03-20 吕大豹 前端开发

来自：医小生与程序猿（微信号：doctor_programmer）

链接：http://www.cnblogs.com/lvdabao/p/5229640.html

前端工程师已经是大家不再陌生的一个软件行业的工种了，尽管这一工种诞生也没几年。作为一名从业三年的前端工程师，我尝试结合业界标准与我的理解，来尽可能诠释一下前端工程师这个职业。这篇文章的适读人群为：非web方向的软件开发者、产品经理以及与产品挂钩的相关人士、正在纠结需不需要招聘一个前端的老板们、刚刚走上工作岗位的前端新手们、以及所有对前端感兴趣的父老乡亲们。

前端工程师的英文名为front-end engineer，简称FE，下文将用FE来代称。国内最早开始招聘FE应该是2011年左右的事情吧，在此之前，FE的工作基本都是由服务端工程师包办的，或者是由设计师来产出HTML页面。那么，是什么样的原因催生出了FE这一职位呢？本文将从FE的工作内容、专业FE应具备的技能和品质来聊聊这个职业。

用户体验的操刀者

前端工程师的首要工作就是开发用户界面，在web系统中，就是指网页了。为什么网页需要专门的FE来写呢？答案就是「用户体验」。随着web2.0概念的普及以及web3.0的提出，用户成为互联网的主要生产者，网页所承载的功能越来越多。

一方面，企业的「用户体验」诉求很强烈。这个很容易就能理解，如果你的产品看上去就像个钓鱼网站而且还特别难用，就会有一部分用户离你而去。非互联网企业呢？也会面临这样的情况，你花了很大的功夫优化数据库，优化服务器负载，你的客户却很难感知到你的努力。你的系统界面还是八九十年代的风格，客户的第一感觉就是这系统不行，不买你的帐。相反，如果你花一点时间做一套崭新风格的界面出来，客户的第一感觉就是这个系统好炫酷，技术含量很高。不要小看这个第一感觉，对于外行人来说，第一感觉往往起到了决定作用。好多企业都意识到了这一点，所以对用户体验的诉求就上去了。

另一方面，现在的用户也都很挑剔。毕竟他们使用的产品一个比一个炫，都被惯坏了，你的产品稍有点不爽的地方，就上微博去给你宣传。

前端工程师是用户体验的把控者，在产品经理构想出交互原型，设计师设计出交互细节后，FE就用他的双手一行行敲出这些代码。他敲出的每一个按钮，每一张图片，都被成千上万的用户点击着，FE与用户可以说是“零距离接触”。作为产品交互的实现者，除了HTML、CSS这两门语言要精通外，对前端要求更高的其实是非技术因素。

FE需要对用户体验有较深的理解。比如页面上有一个超链接，字体比较小的情况下，用户可能会一下点不中，因为链接的可点击区域是紧贴着文字边缘的。前端可以通过很简单的方法来扩大这个链接的可点击区域，使得用户更容易点中。这就是用户体验，正如《瞬间之美》中提到的那样，touch到用户的内心只需要一瞬间。对用户体验的理解，还体现在对一些交互常识的把握上。比如用户操作某个软件的界面，会感觉它很灵巧，却具体说不出到底是哪里。那么很可能是这个界面上的按钮有着设计良好的四态（正常、鼠标移上、鼠标按下、不可用），它会随时对你的操作给出反馈。

懂用户体验的前端工程师，会让他的作品与用户沟通，能够touch到用户心中那一块柔软的区域。

FE需要有一点强迫症。这体现在对任何瑕疵的不容忍。比如采用技术手段让页面的滚动更平滑些，减少页面的视觉抖动，像素级别的定位校准。当用户触碰的内容是一串非电话号码的数字时，不要让手机自动调出拨号功能，等等。很多细节是产品经理无法感知的，因为这些都是很零散的技术手段，只有靠FE来点滴积累。再有极致者，追求让页面的响应时间再减少几个毫秒，让你的手机少耗几KB流量，少耗一些电量。这些甚至连用户都无法感知，但是当你的用户有百万级别或者千万级别，这样做的价值就显现出来了。

前端工程师需要是一个心思细腻之人，需要对美有所领悟，需要执着地追求完美，需要有品味，有思想，有大局观，最好还能懂点心理学。

用户端业务逻辑

做出优雅的界面只是前端工程师的第一步，编程也是必备技能，FE承担着处理用户端业务逻辑的任务。放在以前，用户端就是个IE浏览器，没有什么业务逻辑可言。但现在不同了，用户使用浏览器发表文章、进行社交活动，更复杂的能使用在线工具完成工作。

javascript就是FE需要掌握的编程语言，他应该通晓这门语言的优势和缺点，掌握各种编程思想、开发模式。利用各种技巧实现交互越来越丰富的界面，同时还要与服务端的工程师沟通，调试接口，完成：页面展示——响应用户操作——提交用户数据——反馈操作结果这一系列流程。

从这一点上，要求前端工程师要有软件开发的基础，了解计算机的基本原理，网络通信的基本原理，所以计算机相关专业出身的前端会更有优势一些。

前端也需要架构

写写网页也要架构？有什么好架的？回答这个问题首先得明确一点，FE的工作内容已不再是「写写页面」这么简单。随着前端代码的规模越来越大，逐渐涌现出了模块化开发、MVC、MVVM等开发模式。团队规模也从原来的单兵作战演变为团队开发。

所以，一个高级前端工程师，要有架构能力。这个架构能力包括不限于：

对现有优秀框架的了解与整合使用
根据项目的业务特点构建出合适的开发模式
设计前端测试方案保证代码质量
用工程化方案组织起团队的开发流程。

向前延伸、向后延伸

物联网的市场越来越热了，手机是物联网体系中的一个关键节点。前端工程师的战场已不再是单纯的浏览器，将来会覆盖到各种「端设备」上。得益于javascript语言的灵活性，现在用javascript已经可以开发windows应用、ios应用、android应用，可以编写智能电视上的应用。将来，或许是VR、可穿戴设备、智能家电。这是前端可以向前延伸的方向。

另一方面，由于nodejs的横空出世，javascript这门语言竟然神奇的有了服务端的能力。之前用java、PHP做的事情，js同样可以实现了。本来前端阵营中就有一批人是从后端转过来的，有服务端开发的基础，得了nodejs这一利器，再加上现在市场的需求，快出产品，敏捷开发，前端工程师向后延伸的路线宽广而明亮。事实上，全栈工程师的概念在前年就被提出，BAT这样的业界领头羊早已用nodejs做一些基础设施的建设，而很多小而快的创业公司，也在用nodejs进行快速迭代开发。

持续学习

前端领域的技术更新相对于其他领域要快很多，原因大概也是因为这个领域离用户最近吧。有一些新的技术甚至是颠覆性的，前端工程师必须要跟上时代的步伐，否则你开发出的产品在体验上就落后别人一截了。

有一些市场人员提出的需求，产品经理根据多年的经验评估后觉得无法实现，就被打回了。而事实上，随着新技术的出现，有些你认为无法实现的功能已经可以在前端实现了。随着HTML5的支持度越来越高，前端拥有的能力也会越强。比如利用canvas能够获取到图片上的每一个像素点，这样前端就拥有了图像处理能力。有了FileReader API，前端拥有了本地文件的读取能力，还有地理位置获取等等。

而这些新东西，就需要前端工程师来不断学习。所以，一个称职的前端必须能够保持持续学习能力，能够对新技术有敏锐的嗅觉。活到老，学到老，说的就是前端工程师。

高情商的程序猿

大多数人对程序猿的印象就是情商低、不善言谈。但前端工程师应该是个例外，这是由工作性质决定的。

从工作流程来看，FE处于设计师的下游，他要接设计稿，转化为网页。同时又是后端工程师的上游，需要把用户产生的数据提交到服务端。横向来看，他又与产品经理有着密切接触，因为他可能随时和产品经理探讨交互的细节。这样一个连接着团队中的其他成员的角色，需要他既是一个粘合剂，又是一个润滑剂。

前端工程师需要有较高的沟通能力和理解能力。我们经常开玩笑说“设计师活在童话故事里”，因为有时候他们设计的页面根本不符合常规，无法实现。这个时候你就需要耐心的给设计MM讲原理、讲原因，并且告诉她设计需要遵循哪些基本规范。对于产品经理的思想，你要能把握到位，你得理解他比划了半天到底是想要做什么。与后端工程师打交道的时候，你又得马上化身编程达人，跟他们聊数据类型，聊面向对象，聊设计模式。

你需要能随时切换角色，切换你的表达方式和谈话内容。所以，你得是一只高情商的程序员。

以上就是我对前端工程师的理解，前端的门槛低，但要成为一名专业的前端工程师，需要掌握的东西太多了。除了前端技术外，我认为前端更重要的是综合能力，包括我上面谈到的思维细腻、有品味、有思想、情商高等等。毕竟你要通过代码与用户产生接触，给用户带来愉悦感。从某种程度上来说，你得是一个好恋人。

Inside The Mind That Built Google Brain: On Life, Creativity, And Failure

Source: The Huffington Post

(Photo: Jemal Countess/Getty)

Here’s a list of universities with arguably the greatest computer science programs: Carnegie Mellon, MIT, UC Berkeley, and Stanford. These are the same places, respectively, where Andrew Ng received his bachelor’s degree, his master’s, his Ph.D., and has taught for 12 years.

Ng is an icon of the artificial intelligence world with the pedigree to match, and he is not yet 40 years old. In 2011, he founded Google Brain, a deep-learning research project supercharged by Google’s vast stores of computing power and data. Delightfully, one of its most important achievements came when computers analyzing scores of YouTube screenshots were able to recognize a cat. (The New York Times‘ headline: “How Many Computers to Identify a Cat? 16,000.”) As Ng explained, “The remarkable thing was that [the system] had discovered the concept of a cat itself. No one had ever told it what a cat is. That was a milestone in machine learning.”

Ng exudes a cheerful but profound calm. He happily discusses the various mistakes and failures of his career, the papers he read but didn’t understand. He wears identical blue oxford shirts each and every day. He is blushing but proud when a colleague mentions his adorable robot-themed engagement photo shoot with his now-wife, a surgical roboticist named Carol Reiley (note his shirt in the photo).

One-on-one, he speaks with a softer voice than anyone you know, though this has not hindered his popularity as a lecturer. In 2011, when he posted videos from his own Stanford machine learning course on the web, over 100,000 people registered. Within a year, Ng had co-founded Coursera, which is today the largest provider of open online courses. Its partners include Princeton and Yale, top schools in China and across Europe. It is a for-profit venture, though all classes are accessible for free. “Charging for content would be a tragedy,” Ng has said.

(Photo: Colson Griffith)

Then, last spring, a shock. Ng announced he was departing Google and stepping away from day-to-day involvement at Coursera. The Chinese tech giant Baidu was establishing an ambitious $300 million research lab devoted to artificial intelligence just down the road from Google’s Silicon Valley headquarters, and Andrew Ng would head it up.

At Baidu, as before, Ng is trying to help computers identify audio and images with incredible accuracy, in realtime. (On Tuesday, Baidu announced it had achieved the world’s best results on a key artificial intelligence benchmark related to image identification, besting Google and Microsoft.) Ng believes speech recognition with 99 percent accuracy will spur revolutionary changes to how humans interact with computers, and how operating systems are designed. Simultaneously, he must help Baidu work well for the millions of search users who are brand new to digital life. “You get queries [in China] that you just wouldn’t get in the United States,” Ng explained. “For example, we get queries like, ‘Hi Baidu, how are you? I ate noodles at a corner store last week and they were delicious. Do you think they’re on sale this weekend?’ That’s the query.” Ng added: “I think we make a good attempt at answering.”

Elon Musk and Stephen Hawking have been sounding alarms over the potential threat to humanity from advanced artificial intelligence. Andrew Ng has not. “I don’t work on preventing AI from turning evil for the same reason that I don’t work on combating overpopulation on the planet Mars,” he has said. AI is many decades away (if not longer) from achieving something akin to consciousness, according to Ng. In the meantime, there’s a far more urgent problem. Computers enhanced by machine learning are eliminating jobs long done by humans. The trend is only accelerating, and Ng frequently calls on policymakers to prepare for the socioeconomic consequences.

At Baidu’s new lab in Sunnyvale, Calif., we spoke to Andrew Ng for Sophia, a HuffPost project to collect life lessons from fascinating people. He explained why he thinks “follow your passion” is terrible career advice and he shared his strategy for teaching creativity; Ng discussed his failures and his helpful habits, the most influential books he’s read, and his latest thoughts on the frontiers of AI.

You recently said, “I’ve seen people learn to be more creative.” Can you explain?

The question is, how does one create new ideas? Is it those unpredictable lone acts of genius, people like Steve Jobs, who are special in some way? Or is it something that can be taught and that one can be systematic about?

I believe that the ability to innovate and to be creative are teachable processes. There are ways by which people can systematically innovate or systematically become creative. One thing I’ve been doing at Baidu is running a workshop on the strategy of innovation. The idea is that innovation is not these random unpredictable acts of genius, but that instead one can be very systematic in creating things that have never been created before.

In my own life, I found that whenever I wasn’t sure what to do next, I would go and learn a lot, read a lot, talk to experts. I don’t know how the human brain works but it’s almost magical: when you read enough or talk to enough experts, when you have enough inputs, new ideas start appearing. This seems to happen for a lot of people that I know.

When you become sufficiently expert in the state of the art, you stop picking ideas at random. You are thoughtful in how to select ideas, and how to combine ideas. You are thoughtful about when you should be generating many ideas versus pruning down ideas.

Now there is a challenge still — what do you do with the new ideas, how can you be strategic in how to advance the ideas to build useful things? That’s another whole piece.

Can you talk about your information diet, how you approach learning?

I read a lot and I also spend time talking to people a fair amount. I think two of the most efficient ways to learn, to get information, are reading and talking to experts. So I spend quite a bit of time doing both of them. I think I have just shy of a thousand books on my Kindle. And I’ve probably read about two-thirds of them.

At Baidu, we have a reading group where we read about half a book a week. I’m actually part of two reading groups at Baidu, each of which reads about half a book a week. I think I’m the only one who’s in both of those groups [laughter]. And my favorite Saturday afternoon activity is sitting by myself at home reading.

Let me ask about your early influences. Is there something your parents did for you that many parents don’t do that you feel had a lasting impact on your life?

I think when I was about six, my father bought a computer and helped me learn to program. A lot of computer scientists learned to program from an early age, so it’s probably not that unique, but I think I was one of the ones that was fortunate to have had a computer and could learn to start to program from a very young age.

Unlike the stereotypical Asian parents, my parents were very laid back. Whenever I got good grades in school, my parents would make a fuss, and I actually found that slightly embarrassing. So I used to hide them. [Laughter] I didn’t like showing my report card to my parents, not because I was doing badly but because of their reaction.

I was also fortunate to have gotten to live and work in many different places. I was born in the U.K., raised in Hong Kong and Singapore, and came to the U.S. for college. Then for my own studies, I have degrees from Carnegie Mellon, MIT, and Berkeley, and then I was at Stanford.

I was very fortunate to have moved to all these places and gotten to meet some of the top people. I interned at AT&T Bell Labs when it existed, one of the top labs, and then at Microsoft Research. I got to see a huge diversity of points of view.

Is there anything about your education or your early career that you would have done differently? Any lessons you’ve learned that people could benefit from?

I wish we as a society gave better career advice to young adults. I think that “follow your passion” is not good career advice. It’s actually one of the most terrible pieces of career advice we give people.

If you are passionate about driving your car, it doesn’t necessarily mean you should aspire to be a race car driver. In real life, “follow your passion” actually gets amended to, “Follow your passion of all the things that happen to be a major at the university you’re attending.”

But often, you first become good at something, and then you become passionate about it. And I think most people can become good at almost anything.

So when I think about what to do with my own life, what I want to work on, I look at two criteria. The first is whether it’s an opportunity to learn. Does the work on this project allow me to learn new and interesting and useful things? The second is the potential impact. The world has an infinite supply of interesting problems. The world also has an infinite supply of important problems. I would love for people to focus on the latter.

I’ve been fortunate to have repeatedly been able to find opportunities that had a lot of potential for impact and also gave me fantastic opportunities to learn. I think young people optimizing for these two things will often have the best careers.

Our team here has a mission of developing hard AI technologies, advanced AI technologies that let us impact hundreds of millions of users. That’s a mission I’m genuinely excited about.

Do you define importance primarily by the number of people who are impacted?

No, I don’t think the number is the only thing that’s important. Changing hundreds of millions of people’s lives in a significant way, I think that’s the level of impact that we can reasonably aspire to. That is one way of making sure we do work that isn’t just interesting, but that also has an impact.

You’ve talked previously about projects of yours that have failed. How do you respond to failure?

Well, it happens all the time, so it’s a long story. [Laughter] A few years ago, I made a list in Evernote and tried to remember all the projects I had started that didn’t work out, for whatever reason. Sometimes I was lucky and it worked out in a totally unexpected direction, through luck rather than skill.

But I made a list of all the projects I had worked on that didn’t go anywhere, or that didn’t succeed, or that had much less to show for it relative to the effort that we put into it. Then I tried to categorize them in terms of what went wrong and tried to do a pretty rigorous post mortem on them.

So, one of these failures was at Stanford. For a while we were trying to get aircraft to fly in formation to realize fuel savings, inspired by geese flying in a V-shaped formation. The aerodynamics are actually pretty solid. So we spent about a year working on making these aircraft fly autonomously. Then we tried to get the airplanes to fly in formation.

But after a year of work, we realized that there is no way that we could control the aircraft with sufficient accuracy to realize fuel savings. Now, if at the start of the project we had thought through the position requirements, we would have realized that with the small aircraft we were using, there is just no way we could do it. Wind gusts will blow you around far more than the precision needed to fly the aircraft in formation.

So one pattern of mistakes I’ve made in the past, hopefully much less now, is doing projects where you do step one, you do step two, you do step three, and then you realize that step four has been impossible all along. I talk about this specific example in the strategy innovation workshop I talked about. The lesson is to de-risk projects early.

I’ve become much better at identifying risks and assessing them earlier on. Now when I say things like, “We should de-risk a project early,” everyone will nod their head because it’s just so obviously true. But the problem is when you’re actually in this situation and facing a novel project, it’s much harder to apply that to the specific project you are working on.

The reason is these sorts of research projects, they’re a strategic skill. In our educational system we’re pretty good at teaching facts and procedures, like recipes. How do you cook spaghetti bolognese? You follow the recipe. We’re pretty good at teaching facts and recipes.

But innovation or creativity is a strategic skill where every day you wake up and it’s a totally unique context that no one’s ever been in, and you need to make good decisions in your completely unique environment. So as far as I can tell, the only was we know way to teach strategic skills is by example, by seeing tons of examples. The human brain, when you see enough examples, learns to internalize those rules and guidelines for making good strategic decisions.

Very often, what I find is that for people doing research, it takes years to see enough examples and to learn to internalize those guidelines. So what I’ve been experimenting with here is to build a flight simulator for innovation strategy. Instead of having everyone spend five years before you see enough examples, to deliver many examples in a much more compressed time frame.

Just as in a flight simulator, if you want to learn to fly a 747, you need to fly for years, maybe decades, before you see any emergencies. But in a flight simulator, we can show you tons of emergencies in a very compressed period of time and allow you to learn much faster. Those are the sorts of things we’ve been experimenting with.

When this lab first opened, you noted that for much of your career you hadn’t seen the importance of team culture, but that you had come to realize its value. Several months in, is there anything you’ve learned about establishing the right culture?

A lot of organizations have cultural documents like, “We empower each other,” or whatever. When you say it, everyone nods their heads, because who wouldn’t want to empower your teammates. But when they go back to their desks five minutes later, do they actually do it? It’s difficult for people to bridge the abstract and the concrete.

At Baidu, we did one thing for the culture that I think is rare. I don’t know of any organization that has done this. We created a quiz that describes to employees specific scenarios — it says, “You’re in this situation and this happens. What do you do: A, B, C, or D?”

No one has ever gotten full marks on this quiz the first time out. I think the quiz interactivity, asking team members to apply specifics to hypothetical scenarios, has been our way of trying to connect the abstract culture with the concrete; what do you actually do when a teammate comes to you and does this thing?

What are some books that had a substantial impact on your intellectual development?

Recently I’ve been thinking about the set of books I’d recommend to someone wanting to do something innovative, to create something new.

The first is “Zero to One“ by Peter Thiel, a very good book that gives an overview of entrepreneurship and innovation.

We often break down entrepreneurship into B2B (“business to business,” i.e., businesses whose customers are other businesses) and B2C (“business to consumer”). For B2B, I recommend “Crossing the Chasm.” For B2C, one of my favorite books is “The Lean Startup,” which takes a narrower view but it gives one specific tactic for innovating quickly. It’s a little narrow but it’s very good in the area that it covers.

Then to break B2C down even further, two of my favorites are “Talking to Humans,” which is a very short book that teaches you how to develop empathy for users you want to serve by talking to them. Also, “Rocket Surgery Made Easy.” If you want to build products that are important, that users care about, this teaches you different tactics for learning about users, either through user studies or by interviews.

Then finally there is “The Hard Thing about Hard Things.“ It’s a bit dark but it does cover a lot of useful territory on what building an organization is like.

For people who are trying to figure out career decisions, there’s a very interesting one: “So Good They Can’t Ignore You.” That gives a valuable perspective on how to select a path for one’s career.

Do you have any helpful habits or routines?

I wear blue shirts every day, I don’t know if you know that. [laughter] Yes. One of the biggest levers on your own life is your ability to form useful habits.

When I talk to researchers, when I talk to people wanting to engage in entrepreneurship, I tell them that if you read research papers consistently, if you seriously study half a dozen papers a week and you do that for two years, after those two years you will have learned a lot. This is a fantastic investment in your own long term development.

But that sort of investment, if you spend a whole Saturday studying rather than watching TV, there’s no one there to pat you on the back or tell you you did a good job. Chances are what you learned studying all Saturday won’t make you that much better at your job the following Monday. There are very few, almost no short-term rewards for these things. But it’s a fantastic long-term investment. This is really how you become a great researcher, you have to read a lot.

People that count on willpower to do these things, it almost never works because willpower peters out. Instead I think people that are into creating habits — you know, studying every week, working hard every week — those are the most important. Those are the people most likely to succeed.

For myself, one of the habits I have is working out every morning for seven minutes with an app. I find it much easier to do the same thing every morning because it’s one less decision that you have to make. It’s the same reason that my closet is full of blue shirts. I used to have two color shirts actually, blue and magenta. I thought that’s just too many decisions. [Laughter] So now I only wear blue shirts.

You’ve urged policymakers to spend time thinking about a future where computing and robotics have eliminated some substantial portion of the jobs people have now. Do you have any ideas about possible solutions?

It’s a really tough question. Computers are good at routine repetitive tasks. Thus far, the main things that computers have been good at automating are tasks where you kind of do the same thing day after day.

Now this can be at multiple points on the spectrum. Humans work on an assembly line, making the same motion for months on end, and now robots are doing some of that work. A midrange challenge might be truck-driving. Truck drivers do very similar things day after day, so computers are trying to do that too. It’s harder than most people think, but automated driving might happen in the next decade or so, we don’t know. Then, even higher-end things, like some radiologists read the same types of x-rays over and over each day. Again, computers may have traction in those areas.

But for the social tasks which are non-routine and non-repetitive, those are the tasks that humans will be better at than computers for quite a period of time, I think. In many of our jobs we do different things every day. We meet different people, we have to arrange different things, solve problems differently. Those things are relatively difficult for computers to do, for now.

The challenge that faces us is that, when the U.S. transformed from an agricultural to a manufacturing and services economy, we had people move from one routine task, such as farming, to a different routine task, such as manufacturing or working call service centers. A large fraction of the population has made that transition, so they’ve been okay, they’ve found other jobs. But many of their jobs are still routine and repetitive.

The challenge that faces us is to find a way to scalably teach people to do non-routine non-repetitive work. Our education system, historically, has not been good at doing that at scale. The top universities are good at doing that for a relatively modest fraction of the population. But a lot of our population ends up doing work that is important but also routine and repetitive. That’s a challenge that faces our educational system.

I think it can be solved. That’s one of the reasons why I’ve been thinking about teaching innovation strategy, teaching creativity strategy. We need to enable a lot of people to do non-routine, non-repetitive tasks. These tactics for teaching innovation and creativity, these flight simulators for innovation, could be one way to get there. I don’t think we’ve figured out yet how to do it, but I’m optimistic it can be done.

You’ve said, “Engineers in China work much harder than the average Silicon Valley engineer. Engineers in Silicon Valley at startups work really hard. At mature companies, I don’t see the same intensity as you do in startups and at Baidu.” Why do you think that is?

I don’t know. I think the individual engineers in China are great. The individual engineers in Silicon Valley are great. The difference I think is the company. The teams of engineers at Baidu tend to be incredibly nimble.

There is much less appreciation for the status quo in the Chinese internet economy and I think there’s a much bigger sense that all assumptions can be challenged and everything is up for grabs. The Chinese internet ecosystem is very dynamic. Everyone sees huge opportunity, everyone sees massive competition. Stuff changes all the time. New inventions arise, and large companies will one day suddenly jump into a totally new business sector.

To give you an idea, here in the United States, if Facebook were to start a brand new web search engine, that might feel like a slightly strange thing to do. Why would Facebook build a search engine? It’s really difficult. But that sort of thing is much more thinkable in China, where there is more of an assumption that there will be new creative business models.

This seems to suggests a different management culture, where you can make important decisions quickly and have them be intelligent and efficient and not chaotic. Is Baidu operating in a unique way that you feel is particularly helpful to its growth?

Gosh, that’s a good question. I’m trying to think what to point to. I think decision making is pushed very far down in the organization at Baidu. People have a lot of autonomy, and they are very strategic. One of the things I really appreciate about the company, especially the executives, is there’s a very clear-eyed view of the world and of the competition.

When executives meet, and the way we speak with the whole company, there is a refreshing absence of bravado. The statements that are made internally — they say, “We did a great job on that. We’re not so happy with those things. This is going well. This is not going well. These are the things we think we should emphasize. And let’s do a post-mortem on the mistakes we made.” There’s just a remarkable lack of bravado, and I think this gives the organization great context on the areas to innovate and focus on.

You’re very focused on speech recognition, among other problems. What are the challenges you’re facing that, when solved, will lead to a significant jump in the accuracy of speech recognition technology?

We’re building machine learning systems for speech recognition. Some of the machine learning technologies we’re using now have been around for decades. It was only in the last several years that they’ve really taken off.

Why is that? I often make an analogy to building a rocket ship. A rocket ship is a giant engine together with a ton of fuel. Both need to be really big. If you have a lot of fuel and a tiny engine, you won’t get off the ground. If you have a huge engine and a tiny amount of fuel, you can lift up, but you probably won’t make it to orbit. So you need a big engine and a lot of fuel.

The reason that machine learning is really taking off now is that we finally have the tools to build the big rocket engine — that is giant computers, that’s our rocket engine. And the fuel is the data. We finally are getting the data that we need.

The digitization of society creates a lot of data and we’ve been creating data for a long time now. But it was just in the last several years we’ve been finally able to build big enough rocket engines to absorb the fuel. So part of our approach, not the whole thing, but a lot of our approach to speech recognition is finding ways to build bigger engines and get more rocket fuel.

For example, here is one thing we did, a little technical. Where do you get a lot of data for speech recognition? One of the things we did was we would take audio data. Other groups use maybe a couple thousand hours of data. We use a hundred thousand hours of data. That is much more rocket fuel than what you see in academic literature.

Then one of the things we did was, if we have an audio clip of you saying something, we would take that audio clip of you and add background noise to it, like a clip recorded in a cafe. So we synthesize an audio clip of what you would sound like if you were speaking in a cafe. By synthesizing your voice against lots of backgrounds, we just multiply the amount of data that we have. We use tactics like that to create more data to feed to our machines, to feed to our rocket engines.

One thing about speech recognition: most people don’t understand the difference between 95 and 99 percent accurate. Ninety-five percent means you get one-in-20 words wrong. That’s just annoying, it’s painful to go back and correct it on your cell phone.

Ninety-nine percent is game changing. If there’s 99 percent, it becomes reliable. It just works and you use it all the time. So this is not just a four percent incremental improvement, this is the difference between people rarely using it and people using it all the time.

So what is the hurdle to 99 percent at this point?

We need even bigger rocket engines and we still need even more rocket fuel. Both are still constrained and the two have to grow together. We’re still working on pushing that boundary.