Over the past two years, Facebook AI AI (FAIR) has worked with 13 universities around the world to compile the largest first-person video file ever — specifically to train deep-learning image recognition models. Artificial intelligences trained in the data set better control robots interacting with humans or interpreting images of smart glasses. “Machines can only help us in our daily lives if they really understand the world through our eyes,” says project leader Kristen Grauman from FAIR.
Such technology could support people who need help at home, or guide people in tasks they learn to perform. “The video in this material is much closer to how people observe the world,” says Michael Ryoo, a computer vision researcher at Google Brain and Stony Brook University in New York who is not involved in Ego4D.
However, the potential abuses are clear and worrying. The research is funded by Facebook, a social media giant recently accused of the Senate putting profits ahead of people’s well-being, the atmosphere is reinforced An overview of MIT technology‘s own research.
The business model of Facebook and other Big Tech companies is to twist as much information as possible about people’s behavior online and sell it to advertisers. The artificial intelligence outlined in the project can extend the dimension to people’s everyday offline behavior and reveal objects around a person’s home, what activities he or she enjoyed, with whom he or she spent time, and where his or her gaze lingered — an unprecedented amount of personal information.
“Privacy needs to be addressed when you take this exploratory research from the world into a product,” Grauman says. “This project could even get inspiration from this project.”
Ego4D is a step change. The largest previous first-person video material consists of 100 hours of footage of people in the kitchen. The Ego4D footage consists of 3025 hours of video recorded by 855 people in 73 different locations in nine countries (United States, UK, India, Japan, Italy, Singapore, Saudi Arabia, Colombia and Rwanda).
Participants had different ages and backgrounds; Some were recruited for their visually interesting professions, such as bakers, mechanics, carpenters and landscape designers.
Previous data sets typically consist of only a few seconds of partially written video clips. In Ego4D, participants used head-mounted cameras for up to 10 hours at a time and took a first-person video of daily activities such as walking on the street, reading, doing laundry, shopping, playing with pets, board games, and interacting with other people. Some of the images also include sound, information about where the participants ’gaze was focused, and multiple perspectives on the same scene. It’s the first of its kind, Ryoo says.