【中英双语】对你的企业而言,AI 不必太金贵

AI Doesn’t Have to Be Too Complicated or Expensive for Your Business
by Andrew Ng

Despite the vast potential of artificial intelligence (AI), it hasn’t caught hold in most industries. Sure, it has transformed consumer internet companies such as Google, Baidu, and Amazon — all massive and data-rich with hundreds of millions of users. But for projections that AI will create $13 trillion of value a year to come true, industries such as manufacturing, agriculture, and healthcare still need to find ways to make this technology work for them. Here’s the problem: The playbook that these consumer internet companies use to build their AI systems — where a single one-size-fits-all AI system can serve massive numbers of users — won’t work for these other industries.
尽管AI的潜力巨大,但它尚未在大多数行业立足。当然,它已经改变了谷歌、百度和亚马逊等消费类互联网企业——所有这些公司规模都很庞大,拥有数亿用户的大量数据。不过,要实现AI每年创造13万亿美元价值这一预测,制造业、农业和医疗卫生等行业仍然需要找到办法让这项技术为它们服务。问题在于:这些消费类互联网企业用来构建其AI系统的行动手册——一个一体万用的AI系统可以为大量用户服务——对其他这些行业不起作用。
Instead, these legacy industries will need a large number of bespoke solutions that are adapted to their many diverse use cases. This doesn’t mean that AI won’t work for these industries, however. It just means they need to take a different approach.
相反,这些传统产业将需要大量定制的解决方案,以适应其众多不同的用例。然而,这并不意味着AI不适于这些行业。而只意味着他们需要采取不同的方法。
To bridge this gap and unleash AI’s full potential, executives in all industries should adopt a new, data-centric approach to building AI. Specifically, they should aim to build AI systems with careful attention to ensuring that the data clearly conveys what they need the AI to learn. This requires focusing on data that covers important cases and is consistently labeled, so that the AI can learn from this data what it is supposed to do. In other words, the key to creating these valuable AI systems is that we need teams that can program with data rather than program with code.
为了弥合这一差距并释放AI的全部潜力,所有行业的高管都应该采用新的、以数据为中心的方法来构建AI。具体来说,他们在致力于构建AI系统目标时应该小心注意确保数据清楚地传达他们需要让AI学习的内容。这就需要专注于囊括重要案例、并进行了连贯标记的数据,以便AI能够从这些数据中学习它应该要做的事情。换言之,创建这些有价值的AI系统的关键是我们需要能够使用数据编程而不是代码编程的团队。
Why adopting AI outside of tech can be so hard
为何在科技企业之外采用AI如此之难
Why isn’t AI widely used outside consumer internet companies? The top challenges facing AI adoption in other industries include:
为何AI没有在消费类互联网企业之外得到广泛使用?其他行业采用AI面临的最大挑战包括:
1. Small datasets. In a consumer internet company with huge numbers of users, engineers have millions of data points that their AI can learn from. But in other industries, the dataset sizes are much smaller. For example, can you build an AI system that learns to detect a defective automotive component after seeing only 50 examples? Or to detect a rare disease after learning from just 100 diagnoses? Techniques built for 50 million data points don’t work when you have only 50 data points.
数据集规模小。在一家拥有大量用户的消费类互联网公司中,工程师拥有数百万个数据点,他们的AI可以从中学习。但在其他行业,数据集的规模要小得多。比如,你是否能够建立一个AI系统,在只看了50个例子之后就学会检测有缺陷的汽车部件?或者仅仅学习了100例病例诊断之后就能发现一种罕见疾病?当你只有50个数据点时,构建5000万个数据点的技术不起作用。
2. Cost of customization. Consumer internet companies employ dozens or hundreds of skilled engineers to build and maintain monolithic AI systems that create tremendous value — say, an online ad system that generates more than $1 billion in revenue per year. But in other industries, there are numerous $1-5 million projects, each of which needs a custom AI system. For example, each factory manufacturing a different type of product might require a custom inspection system, and every hospital, with its own way of coding health records, might need its own AI to process its patient data. The aggregate value of these hundreds of thousands of these projects is massive; but the economics of an individual project might not support hiring a large, dedicated AI team to build and maintain it. This problem is exacerbated by the ongoing shortage of AI talent, which further drives up these costs.
定制成本。消费类互联网公司会雇用数十或数百名技术熟练的工程师来构建和维护能够创造巨大价值的庞大AI系统——比如,每年产生超过10亿美元收入的在线广告系统。但在其他行业,有许多100万-500万美元的项目,每个项目都需要一个定制的AI系统。比如,每一家制造不同类型产品的工厂可能需要定制的检查系统,每一家医院因有自己的病历编码方式可能需要自己的AI来处理患者数据。这些数以十万计的项目总价值十分巨大;但是单个项目的经济规模可能无法支持聘用一个大型、专门的AI团队来构建和维护它。这一问题因AI人才的持续短缺而加剧,从而进一步推高了这些成本。
3. Gap between proof of concept and production. Even when an AI system works in the lab, a massive amount of engineering is needed to deploy it in production. It is not unusual for teams to celebrate a successful proof of concept, only to realize that they still have another 12-24 months of work before the system can be deployed and maintained.
概念验证和用于生产之间的时间差。即使AI系统在实验室奏效,要在生产中部署它也还需要大量的工程。团队庆祝概念验证成功,却发现在系统部署和维护之前,他们还有12-24个月的工作,这是很正常的。
For AI to realize its full potential, we need a systematic approach to solving these problems across all industries. The data-centric approach to AI, supported by tools designed for building, deploying, and maintaining AI applications — called machine learning operations (MLOps) platforms — will make this possible. Companies that adopt this approach faster will have a leg up relative to competitors.
为了让AI充分发挥潜力,我们需要一种系统性的方法来解决各行各业的这些问题。这种以数据为中心应对AI的方法在旨在用来构建、部署和维护AI应用程序的工具——机器学习操作(MLOps)平台的支持下,将有可能让这变成现实。更快采用这一方法的企业将获得比竞争对手更大的优势。
Data-centric AI development
以数据为中心的AI开发
AI systems are made up of software — the computer program that includes an AI model — and data, the information used to train the model. For example, to build an AI system for automated inspection in manufacturing, an AI engineer might create software that implements a deep learning algorithm, that is then shown a dataset comprising pictures of good and defective parts, so it can learn to distinguish between them.
AI系统由软件——包括某种AI模型的计算机程序——和数据(用于培育模型的信息)组成。比如,为了构建一个用于制造业自动化检查的AI系统,AI工程师可能会创建可以执行深度学习算法的软件,然后向其显示一个包含优质零件和有缺陷零件图片的数据集,这样它可以学会区分这些零件。
Over the last decade, a lot of AI research was driven by software-centric development (also called model-centric development) in which the data is fixed, and teams attempt to optimize or invent new programs to learn well from the available data. Many tech companies had large datasets from millions of consumers, and they used it to drive a lot of innovation in AI.
在过去的十年中,许多AI研究都是由以软件为中心的开发(也称为以模型为中心的开发)所推动的,数据在这种开发中是固定的。团队试图优化或发明新的程序,以便好好地从现有数据中学习。许多科技企业拥有源自数百万消费者的大型数据集,他们利用这些数据来推动AI的大量创新。
But at AI’s current level of sophistication, the bottleneck for many applications is getting the right data to feed to the software. We’ve heard about the benefits of big data, but we now know that for many applications, it is more fruitful to focus on making sure we have good data — data that clearly illustrates the concepts we need the AI to learn. This means, for example, the data should be reasonably comprehensive in its coverage of important cases and labeled consistently. Data is food for AI, and modern AI systems need not only calories, but also high-quality nutrition.
然而,在AI目前的发展水平上,许多应用程序的瓶颈在于获得正确的数据提供给软件。我们已经听说了大数据的好处,但我们现在知道,对于许多应用程序而言,专注于确保我们拥有优质的数据会更富有成效——这些数据清楚地说明了我们需要让AI学习的概念。这意味着数据在对重要案例的覆盖面上应该适当全面并进行连贯标记。数据是AI的食粮,现代AI系统不仅需要卡路里,还需要高质量的营养。
Shifting your focus from software to data offers an important advantage: it relies on the people you already have on staff. In a time of great AI talent shortage, a data-centric approach allows many subject matter experts who have vast knowledge of their respective industries to contribute to the AI system development.
将重点从软件转变到数据提供了一个重大好处:它依赖的是你的现有员工。在AI人才严重短缺的时代,以数据为中心的方法允许许多在各自行业拥有渊博知识的主题专家为AI系统的开发尽力。
For example, most factories have workers that are highly skilled at defining and identifying what counts as a defect (is a 0.2mm scratch a defect? or is it so small that it doesn’t matter?). If we expect each factory to ask its workers to invent new AI software as a way to get that factory the bespoke solution it needs, progress will be slow. But we instead build and provide tools to empower these domain experts to engineer the data — by allowing them to express their knowledge about manufacturing through providing data to the AI — their odds of success will be much higher.
比如,多数工厂的工人都非常擅长定义和识别什么才算缺陷(0.2mm的划痕是否是缺陷?还是说它小得无关紧要?)。如果我们期望每家工厂要求其工人发明新的AI软件,以此让工厂获得其所需的定制解决方案,那么进展会十分迟缓。但是,如果我们转而构建并提供工具,使这些领域专家能够设计数据——通过向AI提供数据、使他们能够表达自己在制造业方面的知识——他们成功的几率会高得多。
Make building and using AI systematic and repeatable
让构建和使用AI变得系统化且可重复
The shift toward data-centric AI development is being enabled by the emerging field of MLOps, which provides tools that make building, deploying, and maintaining AI systems easier than ever before. Tools that are geared to help produce high-quality datasets, in particular, hold the key to addressing the challenges of small datasets, high cost of customization, and the long road to getting an AI project into production outlined above.
向以数据为中心的AI开发的转变是由新兴的MLOps领域实现的,该领域提供了比以往任何时候都更容易构建、部署和维护AI系统的工具。要特别指出的是,用来帮助生成高质量数据集的工具对于解决数据集小、定制成本高以及AI项目应用于生产的时间长等挑战至关重要。
How, exactly? First, ensuring high-quality data means that AI systems will be able to learn from the smaller datasets available in most industries. Second, by making it possible for a business’ domain experts, rather than AI experts, to engineer the data, the ability to use AI will become more accessible to all industries. And third, MLOps platforms provide much of the scaffolding software needed to take an AI system to production, so teams no longer have to develop this software. This allows teams to deploy AI systems — and bridge the gap between proof of concept and production weeks or months rather than years.
到底怎样才能做到?首先,确保数据高质量意味着AI系统能够从多数行业可用的较小数据集中学习。其次,通过让企业的领域专家而非AI专家能够设计数据,所有行业都可以更容易地使用AI。第三,MLOps平台提供了将AI系统应用于生产所需的许多脚手架软件,因此团队不必再开发这种软件。这使得团队能够部署AI系统——并将概念验证与用于生产之间的时间差缩短至数周或数月,而不是几年。
The vast majority of valuable AI projects have yet to be imagined. And even for projects that teams are already working on, the gap that leads to deployment in production remains to be bridged — indeed, Accenture estimates that 80% to 85% of companies’ AI projects are in the proof-of-concept stage.
绝大多数有价值的AI项目尚待构想。即使是各团队已经在进行的项目,逐渐在生产中部署的时间差仍有待缩短——事实上,埃森哲估计,80%到85%的企业的AI项目处于概念验证阶段。
Here’re some things companies can do right now:
以下是企业现在可以做的一些事情:
1. Instead of merely focusing on the quantity of data you collect, also consider the quality, make sure it clearly illustrates the concepts we need the AI to learn.
不要只关注所收集数据的数量,也要关注质量,确保它清楚地说明了我们需要让AI学习的概念。
2. Make sure your team considers taking a data-centric approach rather than a software-centric approach. Many AI engineers, including many with strong academic or research backgrounds, were trained to take a software-centric approach; urge them to adopt data-centric techniques as well.
确保团队考虑采用以数据为中心的方法,而不是以软件为中心的方法。许多AI工程师,包括许多具有强大学术或研究背景的工程师,接受过的是以软件为中心的方法培训;要敦促他们也采用以数据为中心的技术。
3. For any AI project that you intend to take to production, be sure to plan the deployment process and provide MLOps tools to support it. For example, even while building a proof of concept system, urge the teams to begin developing a longer-term plan for data management, deployment, and AI system monitoring and maintenance.
对于打算应用于生产的任何AI项目,请确保对部署过程进行规划并提供MLOps工具予以支持。比如,即使在构建概念验证系统时,也要敦促团队开始制定长期计划进行数据管理、部署以及AI系统的监控和维护。
It’s possible for AI to become a thriving asset outside of data-rich consumer internet businesses, but has yet to hit its stride in other industries. But because of this, the greatest untapped opportunity for AI may lie in taking it to these other industries. Just as electricity has transformed every industry, AI is on a path to do so too. But the next few steps on that path will require a shift in our playbook for how we build and deploy AI systems. Specifically, a new data-centric mindset, coupled with MLOps tools that allow industry domain experts to participate in the creation, deployment and maintenance of AI systems, will ensure that all industries can reap the rewards that AI can offer.
AI有可能成为数据丰富的消费类互联网企业之外的一项蓬勃发展的资产,但尚未在其他行业取得进展。但正因为如此,AI尚未开发的最大机会可能在于将其带到其他行业。就像电力改变了每个行业一样,AI也走在同样的道路上。但这条道路上的下几个步骤将要求我们在行动手册中改变构建和部署AI系统的方式。具体而言,新的以数据为中心的思维方式,加上允许行业领域专家参与AI系统创建、部署和维护的MLOps工具,将确保所有行业都能收获AI所能提供的回报。
吴恩达是Landing AI的创始人兼CEO、百度前副总裁兼首席科学家、CurSera的联合董事长兼共同创始人、Google Brain的前创始负责人,以及斯坦福大学的兼职教授。
刘隽 | 编辑