Machine teaching with Microsoft’s Project Bonsai
By Simon Bisson, Columnist, InfoWorld |
With machine learning (ML) at the heart of much of modern computing, the interesting question is: How do machines learn? There’s a lot of deep computer science in machine learning, producing models that use feedback techniques to improve and training on massive data sets to construct models that can use statistical techniques to infer results. But what happens when you don’t have the data to build a model using these techniques? Or when you don’t have the data science skills available?
Not everything that we want to manage with machine learning generates vast amounts of big data or has the labeling necessary to make that data useful. In many cases, we might not have the needed historic data sets. Perhaps we’re automating a business process that’s never been instrumented or working in an area where human intervention is critical. In other cases we might be trying to defend a machine learning system from adversarial attacks, finding ways to work around poisoned data. This is where machine teaching comes in, guiding machine learning algorithms towards a target and working with experts.
Microsoft has been at the forefront of AI research for some time, and the resulting Cognitive Service APIs are built into Azure’s platform. It now offers tools for developing and training your own models using big data stored in Azure. However, those traditional machine learning platforms and tools aren’t Microsoft’s only offering, as its Project Bonsai low-code development tool offers a simple way of using machine teaching to drive ML development for industrial AI.
Delivered as part of Microsoft’s Autonomous Systems suite, Project Bonsai is a tool for building and training machine learning models, using a simulator with human input to allow experts to build models without needing programming or machine learning experience. It doubles as a tool for delivering explainable AI, as the machine teaching phase of the process shows how the underlying ML system came to a decision.
At the heart of Project Bonsai is the concept of the training simulation. These implement a real-world system that you want to control with your machine learning application, and so you need to build using familiar engineering simulation software, such as MATLAB’s Simulink or custom code running in a container. If you’re already using simulators as part of a control system development environment or as a training tool, these can be repurposed for use with Project Bonsai.
Training simulators that have a user interface are a useful tool here, as they can capture user input as part of the training process. Simulators need to make it very clear when an operation has failed, why it has failed, and how the failure happened. This information can be used as inputs to the training tool, helping teach the model where errors may occur and enabling it to find signs of the error occurring. For example, a simulator being used to train a Project Bonsai model to control an airport luggage system could indicate how running conveyors too fast will cause luggage to fall off, and running too slow can cause bottlenecks. The system then learns to find an optimum speed for maximum throughput of bags.
There’s a close link between Project Bonsai and control systems, especially those that take advantage of modern control theory to manage systems within a set of boundaries. To work well with ML models, a simulator needs to give a good picture of how the simulated object or service responds to inputs and delivers appropriate outputs. You need to be able to set a specific start state, allowing the simulator and the ML model to adapt to changing conditions. The inputs need to be quantified so that your ML system can make discrete changes to the simulator, for example, speeding up our simulated baggage system by 1m/s.
Getting the right simulator is probably the hardest aspect of working with Project Bonsai. You may not need data science skills, but you definitely need simulation skills. It’s a good idea to work with subject matter experts as well as simulation experts to build your simulator and make it as accurate as possible. A simulation that diverges from the real-world system you intend to manage with ML will result in a badly trained model.
Once you have a simulation, you can start to teach your Project Bonsai ML model in the Training Engine. Microsoft calls these models “brains,” as they’re based on neural networks. There are four modules: an architect, an instructor, a learner, and a predictor. The architect uses the training curriculum to choose and optimize a learning algorithm (currently using one of three different options: Distributed Deep Q Network, Proximal Policy Optimization, or Soft Actor Critic).
Once the architect has selected a learning model, the instructor runs through the training plan, interactively driving the simulator and responding to outputs from the learner. You can perhaps think of the instructor and the learner as a pair, the learner being where the ML model is trained using the chosen algorithm and using data from the simulator with inputs from the instructor. Once the learning process is complete, the system will deliver a predictor, which is a trained algorithm with an API endpoint that runs as an inferencing engine, rather than training. The predictor’s outputs can be compared with outputs from the learner to test if changes improve the model.
Machine teaching, at least in Project Bonsai, is focused on reaching specific goals. You can think of these much like the boundary conditions for a control model. The goals available are relatively simple, for example setting something to be avoided or setting a target to be reached as quickly as possible. Other goals include setting maximum or minimum values and keeping a system near a specific target value. The training engine will work to support as many goals as you set in your training curriculum. Goals like these simplify machine learning considerably. There’s no need to build complex training algorithms; all that’s necessary is to define the targets that your ML model will need to reach and Project Bonsai handles the rest for you.
The output of Project Bonsai is a machine learning model with the endpoints needed for your code to work. The model can be updated over time, adding new goals and refining the training as necessary, comparing predicted results with actual operations.
The teaching curriculum is written in a language called Inkling. It’s a domain-specific language that takes named objects from a simulator, linking sensors and actuators. Inkling uses sensors to get states, and actuators to drive actions, with what it calls “concept nodes” to describe the goals. It’s not hard to learn Inkling, and most subject matter experts should be able to write a simple training module very quickly. More complex models can be built by adding more functions to an Inkling application. Microsoft provides a complete Inkling language reference, and it should help you get started writing Project Bonsai training.
Project Bonsai runs on Azure, and you will need to budget for its operations. Models and simulators are stored in the Azure Container Registry, using containers to run simulations. Logs are managed using Azure Monitor, and Azure Storage holds archived simulators. Costs shouldn’t be too high, but it’s worth monitoring them and removing unwanted resource groups once you have trained your models.
Machine teaching provides an alternative approach to ML development that works well with control problems, such as working with industrial equipment. It avoids needing large amounts of data, and by using goals to teach a model, it can be trained by anyone with an understanding of the problem and basic programming skills. It’s not quite a no-code system, as training needs to be written in Inkling, and you need expert input in writing and instrumenting a simulator to run inside the Project Bonsai training environment. With a well-designed training curriculum and an accurate simulation, you should be able to build what used to be very complex ML models surprisingly quickly, moving machine learning from predictions to control.
Next read this:
Author of InfoWorld's Enterprise Microsoft blog, Simon Bisson has worked in academic and telecoms research, been the CTO of a startup, run the technical side of UK Online, and done consultancy and technology strategy.
Copyright © 2021 IDG Communications, Inc.
Next read this: