About a year ago we wroteWhat machine learning means in software development. “In this article, we talked about the concept of Andrej Karpatia Software 2.0. Karpathy argues that we are at the beginning of a fundamental change in the way we develop software. So far, we have built the systems by telling the systems carefully and meticulously what to do, instructions according to the instructions. The process is slow, tedious, and prone to error; most of us have spent days staring at a program that should work, but no. And most of us are surprised when some reliable program for a while suddenly gets caught up in a bit of an unexpected input. The last fault is always the one you find next; if someone hasn’t said so yet, someone should have.
Karpathy suggests something completely different: machine learning allows us to stop thinking about programming by writing steps from instructions in a programming language like C or Java or Python. Instead, we can program with an example. We can collect many examples of what we want the program to do and what we don’t (examples of right and wrong use), label them appropriately, and train the model to work properly with new inputs. In short, we can use machine learning to automate software development itself.
It’s time to evaluate what has happened over the years since we wrote the article. Are we seeing the first steps towards the introduction of Software 2.0? Yes, but so far they are just small steps. Most companies do not have the artificial intelligence skills to implement Karpathy’s vision. Traditional programming is well understood. Training models are not yet well known, at least in companies that have not already invested significantly in technology (in general) or artificial intelligence (in particular). The construction of information tubes and the introduction of ML systems are also not well understood. Companies that systematically develop ML and AI applications are companies that already have advanced artificial intelligence practices.
This is not to say that we do not see tools to automate the various aspects of software design and computing. These tools are beginning to show, especially for building in-depth learning models. The introduction of tools such as AWS continues Sagemaker and Google AutoML. AutoML vision you can build models without coding; we also see the construction of a code-free model startupit Like MLJAR and Lobeand tools that focus on computer vision, such as Platform.ai and Matroid. A sign that companies are increasing the use of ML and artificial intelligence is that we are seeing an increase in information communities designed to accelerate the development and deployment of ML in companies that are growing Teams focused on machine learning and artificial intelligence. Several AI leaders have described internally the platforms they have built (such as Uber Michelangelo, Facebook FBLearner, Twitter Cortexand Apple Overton); these companies influence other companies that start building their own tools. Companies like Databases we build software as a service (SaaS) or local tools for companies that are not ready to build their own platform.
We’ve also seen (and featured at the O’Reilly Artificial Intelligence Conference) Snorkel, An ML-based tool for automatic data tagging and synthetic data generation. HoloClean, another tool developed by researchers at Stanford, Waterloo, and Wisconsin, performs automatic error detection and correction. Like Chris Ré said at our conference, we have made great strides in automating data collection and model creation; but the markings and cleaning data have stubbornly opposed automation. Tim Kraska of MIT in Beijing at the O’Reilly Artificial Intelligence Conference in Beijing discussed how machine learning models have high-performance, known algorithms for database optimization, disk storage optimization, basic data structures, and even process timing. The handcrafted algorithms you learn in school may no longer be relevant because artificial intelligence can work better. Instead of learning to sort and index, the next generation of programmers can learn to apply machine learning to these problems.
One of the most popular projects we’ve seen has been RISE laboratories AutoPandas. Given the set of inputs and outputs they should generate, AutoPandas creates a program based on those inputs and outputs. This “programming by example” is an exciting step towards Software 2.0.
What are the biggest barriers to adoption? The same problems that artificial intelligence and ML face everywhere else (and, frankly, all new technologies face): lack of skilled people, difficulty finding the right uses, and difficulty finding information. This is one reason why Software 2.0 has the biggest impact on data science: there are skilled people out there. They are the same people who know how to collect and pre-process data and who know how to define problems that can be realistically solved with ML systems. With the help of automated tools for optimizing AutoPandas and database queries, we are just beginning to see artificial intelligence tools aimed at software developers.
Machine learning also involves for sure risks, and many companies may not be willing to accept these risks. Traditional programming is by no means risk-free, but at least these risks are familiar. Machine learning raises the question of explanability. You may not be able to explain why your software does what it does, and there are many application areas (such as medicine and law) where explanability is necessary. Reliability is also an issue: a 100% accurate machine learning system cannot be built. If you train the system to manage inventory, how many of the system’s decisions are wrong? It may make fewer mistakes than humans, but we are more comfortable with mistakes made by humans. We are just beginning to understand safety the consequences of machine learning, and wherever information is available, privacy issues are almost certain to follow. Understanding and addressing the risks of ML and artificial intelligence requires cross-border Teams; these groups must belong not only to people with different types of expertise (security, privacy, compliance, ethics, design and competence), but also to people from different social and cultural backgrounds. Risks that one socio-cultural group accepts without thinking twice are often completely intolerable to those from different backgrounds; think, for example, of what the use of facial recognition means to the people of Hong Kong.
However, these problems can be solved. Model management, model functions, the origin of the data and the lineage of the data are emerging as hot topics for people and organizations implementing artificial intelligence solutions. Knowing where your information came from and how it has changed, as well as understanding how your models evolve over time, is a critical step in security. Governance and origin will become even more important when the use of information is regulated. and we are starting to see data-driven companies monitor corporate leadership in highly regulated industries, such as banking and healthcare.
We are on the brink of revolution in software construction. How far does that revolution go? We do not know; it’s hard to imagine artificial intelligence systems that design good user interfaces for humans – even once designed, artificial intelligence is easy to imagine to build these interfaces. Nor is it easy to imagine artificial intelligence systems that design good APIs for programmatic access to applications. But it is clear that artificial intelligence can and does have a big impact on how we develop software. Perhaps the biggest change is not a reduction in the need for programmers, but the release of programmers to think more about what we do and why. What are the right problems to solve? How do we create useful software for everyone? It is ultimately a more important issue than building a new ecommerce application. And if Software 2.0 allows us to pay more attention to these issues, it will be a truly worthwhile revolution.