With this “fine growth,” the observer can reach an unprecedented overall level — it’s competitive with domain-specific models in benchmarks based on images, 3D point clouds, and sound and images together. But because the original observer produced only one output per input, it was not as versatile as the researchers needed. Perceiver IO fixes this problem by paying attention to encoding a hidden table, but also decrypting it, which gives the network great flexibility. Observer IO now scales to large and diverse revenues and outputs and can even handle many tasks or data types at once. This opens the door to all sorts of applications, such as understanding the meaning of text for each character, tracking the movement of all points in an image, processing video-forming sound, images and labels, and even playing games, while using a simpler architecture than one.