NotesWhat is notes.io?

Notes brand slogan

Notes - notes.io

Learning To Play Minecraft With Video PreTraining (VPT)

The internet contains an enormous amount of publicly available videos that we can learn from. You can watch a person make a gorgeous presentation, a digital artist draw a beautiful sunset, and a Minecraft player build an intricate house. However, these videos only provide a record of what happened but not precisely how it was achieved, i.e. The sequence of mouse movements and key presses will not be known. This lack of action labels presents a new challenge for us if we want to build large-scale foundation model in these domains, as we have done in language with GPT. letrastraducidas.com Action labels are the next words in a sentence.



In order to utilize the wealth of unlabeled video data available on the internet, we introduce a novel, yet simple, semi-supervised imitation learning method: Video PreTraining (VPT). We begin by collecting a small data set from contractors. Here we record their video and the actions they take, in this case keypresses or mouse movements. We then train an inverse dynamic model (IDM) from this data, which predicts the actions taken at each stage in the video. The IDM can use both past and future information to predict the action at each step. This task is simpler and requires less data than behavioral cloning tasks that require predicting actions using past video frames. This task requires inferring the person's intentions and how they will accomplish them. The trained IDM will then be able to label a greater number of online video clips and learn to behave via behavioral cloning.



VPT Zero-Shot Results



Our method was validated in Minecraft because (1) it is the most played video game in the world, and therefore has a lot of video data. (2) It is open-ended and offers a wide range of activities that are similar to real-world applications like computer usage. Unlike prior works in Minecraft that use simplified action spaces aimed at easing exploration, our AI uses the much more generally applicable, though also much more difficult, native human interface: 20Hz framerate with the mouse and keyboard.



Our behavioral cloning model, the "VPT foundation model", has been trained on 70,000 hours IDM-labeled online videos. It can accomplish tasks in Minecraft that are almost impossible to achieve using reinforcement learning from scratch. It learns to cut down trees to get logs, make planks from them, and then craft the planks into a crafting board. This takes approximately 50 seconds or 1,000 game actions for a human Minecraft expert.



The model can also perform complex tasks that humans often do in the game like swimming, hunting animals for food and eating the food. It also learned the skill to "pillar jump", which is a Minecraft-style behavior where you elevate yourself by repeatedly jumping on blocks and placing them underneath.



Fine-tuning with behavioral Cloning



Foundation models are designed to have a broad behavioral profile and be capable of performing a wide range of tasks. It is common to fine-tune models to smaller, more precise datasets to incorporate new knowledge and allow them to specialize on a narrower task allocation. To show how well the VPT foundation can be fine-tuned for downstream datasets, we asked our contractors if they would play for 10 mins in new Minecraft worlds and build houses from basic Minecraft materials. This would improve the foundation model’s ability to reliably execute "early-game" skills like building crafting table. When fine-tuning to this dataset, not only do we see a massive improvement in reliably performing the early game skills already present in the foundation model, but the fine-tuned model also learns to go even deeper into the technology tree by crafting both wooden and stone tools. Sometimes, we can even see basic shelter construction and the agent searching for villages, including raiding chests.



Improved early game behavior from BC fine-tuning



Data Scaling



Our most important hypothesis is that using labeled contractor data to train an IDM is much more efficient than using that same small contractor dataset to directly train a BC foundation modeling model. This hypothesis is supported by foundation models being trained on increasing amounts of data, from 1 to 70,000 hours. Training on less than 2000 hours of data will result in the use of contractor data with ground-truth labels. For those who have trained on more than 2000 hours, we use internet data labeled using our IDM. We then take each foundation model and fine-tune it to the house building dataset described in the previous section.



Fine-tuning influenced by foundation model training data



As foundation model data increases we generally see an improvement in crafting ability. Only at the largest data scale can we see stone tool crafting.



Fine-Tuning and Reinforcement Learning



When it is possible to specify a reward function, reinforcement learning (RL) can be a powerful method for eliciting high, potentially even super-human, performance. However, many tasks require overcoming hard exploration challenges, and most RL methods tackle these with random exploration priors, e.g. Entropy bonuses can be used to encourage models to act randomly. VPT models should be a better choice for RL as emulating human behavior will likely prove more beneficial than random actions. Our model was given the difficult task of collecting a diamond pickaxe. This is a unique capability in Minecraft, made more difficult by the native human interface.



The process of crafting a diamond pickaxe is complicated and requires many subtasks. To make this task tractable, we reward agents for each item in the sequence.



A random initialization (the standard RL technique) is the best way to train RL policies. This means that it rarely learns how to collect logs or sticks and doesn't get any rewards. Fine-tuning from VPT models not only teaches how to craft diamond pickaxes in 2.5% of 10-minute Minecraft episodes, but also has a human-level success rate when it comes to collecting all items that lead to the diamond pickaxe. This is the first time anyone has shown a computer agent capable of crafting diamond tools in Minecraft, which takes humans over 20 minutes (24,000 actions) on average.



Reward for episodes



Conclusion



VPT is a path to allowing agents to learn how to act by watching the large number of videos on the Internet. VPT offers the possibility of learning large-scale behavioral priors directly in domains other than language, as opposed to contrastive methods and generative video modelling. Although we have only tested in Minecraft, the game's open-ended nature and the keyboard and mouse interface are very generic make it easy to apply our findings to other domains. computer usage.



For more information, please see our paper. We are also open sourcing contractor data, Minecraft code, model weights, and Minecraft environment. This will help in future VPT research. We are also partnering with the MineRL NeurIPS Competition this year. Contestants can fine-tune and use our models to solve many challenging tasks in Minecraft. Those interested can check out the competition webpage and compete for a blue-sky prize of $100,000 in addition to a regular prize pool of $20,000.


Website: https://letrastraducidas.com/
     
 
what is notes.io
 

Notes.io is a web-based application for taking notes. You can take your notes and share with others people. If you like taking long notes, notes.io is designed for you. To date, over 8,000,000,000 notes created and continuing...

With notes.io;

  • * You can take a note from anywhere and any device with internet connection.
  • * You can share the notes in social platforms (YouTube, Facebook, Twitter, instagram etc.).
  • * You can quickly share your contents without website, blog and e-mail.
  • * You don't need to create any Account to share a note. As you wish you can use quick, easy and best shortened notes with sms, websites, e-mail, or messaging services (WhatsApp, iMessage, Telegram, Signal).
  • * Notes.io has fabulous infrastructure design for a short link and allows you to share the note as an easy and understandable link.

Fast: Notes.io is built for speed and performance. You can take a notes quickly and browse your archive.

Easy: Notes.io doesn’t require installation. Just write and share note!

Short: Notes.io’s url just 8 character. You’ll get shorten link of your note when you want to share. (Ex: notes.io/q )

Free: Notes.io works for 12 years and has been free since the day it was started.


You immediately create your first note and start sharing with the ones you wish. If you want to contact us, you can use the following communication channels;


Email: [email protected]

Twitter: http://twitter.com/notesio

Instagram: http://instagram.com/notes.io

Facebook: http://facebook.com/notesio



Regards;
Notes.io Team

     
 
Shortened Note Link
 
 
Looding Image
 
     
 
Long File
 
 

For written notes was greater than 18KB Unable to shorten.

To be smaller than 18KB, please organize your notes, or sign in.