Notes
Notes - notes.io |
The internet has a huge amount of freely available videos that we can use to learn. You can watch a person make a gorgeous presentation, a digital artist draw a beautiful sunset, and a Minecraft player build an intricate house. These videos are merely a record of what happened, but they do not show the exact process. You won't know the exact sequence or keys that were pressed. These domains are not as easy to build foundation models on a large scale as language with GPT. The lack of action label is a new challenge. In language, "action labels", which are simply the next sentence, presents a new challenge.
In order to utilize the wealth of unlabeled video data available on the internet, we introduce a novel, yet simple, semi-supervised imitation learning method: Video PreTraining (VPT). We begin by gathering data from contractors. We record not only the video but also the actions taken, which in our case is keypresses, and mouse movements. We use this data to train an inverse dynamics (IDM) that predicts the action at each step of the video. Importantly, the IDM is able to use past and future data to predict the action at each stage. This task is far easier than the behavioral cloning task that requires predicting actions based on past video frames. This task requires inferring what the person wants and how they want to achieve it. The trained IDM can then label a larger number of online videos and learn how to act through behavioral cloning.
VPT Zero-Shot Results
We validated our method in Minecraft (1) because it is one of most popular video games in the world. This means that there is a lot of video data available. (2) Because Minecraft is open-ended, there are many things you can do. This is similar to real-world applications such computer usage. Our AI uses the 20Hz framerate with the mouse, keyboard, and mouse, which is a departure from previous Minecraft AI works.
Extremecraft
Trained on 70,000 hours of IDM-labeled online video, our behavioral cloning model (the "VPT foundation model") accomplishes tasks in Minecraft that are nearly impossible to achieve with reinforcement learning from scratch. It learns how to cut down trees to collect logs and craft those logs into planks. Then it crafts those planks into crafting tables. This process takes a human who is proficient in Minecraft approximately 50 seconds, or 1,000 consecutive actions.
Additionally, the model can perform other complex skills that are common in the game such as swimming and hunting animals for food. It also learned the skill to "pillar jump", which is a Minecraft-style behavior where you elevate yourself by repeatedly jumping on blocks and placing them underneath.
Fine-tuning through Behavioral Cloning
Foundation models are designed to have a broad behavior profile and be generally capable across a wide variety of tasks. It is common to fine tune these models to smaller datasets in order to add new knowledge or to allow them to specialize in a narrower task range. To demonstrate how the VPT foundation model can adapt to downstream datasets, our contractors were asked to play in new Minecraft worlds for 10 minutes and build a house using basic Minecraft materials. This would improve the foundation model’s ability to reliably execute "early-game" skills like building crafting table. When fine-tuning to this dataset, not only do we see a massive improvement in reliably performing the early game skills already present in the foundation model, but the fine-tuned model also learns to go even deeper into the technology tree by crafting both wooden and stone tools. Sometimes we see the agent searching through villages, including raiding caches.
Improved early game behavior from BC fine-tuning
Data Scaling
Perhaps the most important hypothesis in our work is that labeled contractor information is more effective than directly training a BC foundation model using the same small contractor data. To verify this hypothesis, foundation models are trained with increasing amounts of data, starting at 1 to 70,000. For those with less than 2,000 hours of data, they are trained on contractor data with ground truth labels. Those with more than 2,000 hours are trained with internet data labeled by our IDM. We then take each foundation model and fine-tune it to the house building dataset described in the previous section.
Fine-tuning: Effect of foundation model training data
As foundation model data increases we generally see an improvement in crafting ability. Only at the largest data scale can we see stone tool crafting.
Fine-Tuning combined with Reinforcement learning
Reinforcement learning (RL), which can be used to elicit high performance, possibly even superhuman, results, is powerful when it is possible for the reward function to be specified. However, many tasks require overcoming hard exploration challenges, and most RL methods tackle these with random exploration priors, e.g. Models are often incentivized via entropy bonus to act randomly. The VPT model should be a much better prior for RL because emulating human behavior is likely much more helpful than taking random actions. Our model was given a difficult task: collect a diamond pickaxe. This unprecedented ability in Minecraft is made more difficult by the use of the native human interface.
Crafting a diamond pickaxe requires a long and complicated sequence of subtasks. To make this task tractable, we reward agents for each item in the sequence.
A random initialization, the standard RL method, is not rewarded well. The policy does not learn to collect logs and seldom collects sticks. In stark contrast, fine-tuning from a VPT model not only learns to craft diamond pickaxes (which it does in 2.5% of 10-minute Minecraft episodes), but it even has a human-level success rate at collecting all items leading up to the diamond pickaxe. This is the first time someone has demonstrated a computer-based agent that can create diamond tools in Minecraft. This takes over 20 minutes on average (24,000 actions).
Reward yourself for watching more episodes
Conclusion
VPT is a path to allowing agents to learn how to act by watching the large number of videos on the Internet. VPT offers the possibility of learning large-scale behavioral priors directly in domains other than language, as opposed to contrastive methods and generative video modelling. While we only experiment in Minecraft, the game is very open-ended and the native human interface (mouse and keyboard) is very generic, so we believe our results bode well for other similar domains, e.g. computer usage.
Our paper provides more details. We are also open sourcing contractor data, Minecraft code, model weights, and Minecraft environment. This will help in future VPT research. We have also partnered with the MineRL NeurIPS contest this year. Contestants can use and fine-tune our models to try to solve many difficult tasks in Minecraft. Anyone interested can visit the competition webpage to compete for a $100,000 blue-sky prize and a $20,000. regular prize pool.
My Website: https://www.extremecraft.net/
![]() |
Notes is a web-based application for online taking notes. You can take your notes and share with others people. If you like taking long notes, notes.io is designed for you. To date, over 8,000,000,000+ notes created and continuing...
With notes.io;
- * You can take a note from anywhere and any device with internet connection.
- * You can share the notes in social platforms (YouTube, Facebook, Twitter, instagram etc.).
- * You can quickly share your contents without website, blog and e-mail.
- * You don't need to create any Account to share a note. As you wish you can use quick, easy and best shortened notes with sms, websites, e-mail, or messaging services (WhatsApp, iMessage, Telegram, Signal).
- * Notes.io has fabulous infrastructure design for a short link and allows you to share the note as an easy and understandable link.
Fast: Notes.io is built for speed and performance. You can take a notes quickly and browse your archive.
Easy: Notes.io doesn’t require installation. Just write and share note!
Short: Notes.io’s url just 8 character. You’ll get shorten link of your note when you want to share. (Ex: notes.io/q )
Free: Notes.io works for 14 years and has been free since the day it was started.
You immediately create your first note and start sharing with the ones you wish. If you want to contact us, you can use the following communication channels;
Email: [email protected]
Twitter: http://twitter.com/notesio
Instagram: http://instagram.com/notes.io
Facebook: http://facebook.com/notesio
Regards;
Notes.io Team
