Hot Chips 2020 Live Blog: Keynote Day 2, Dan Belov of Deepmind (1:30pm PT)
05:02PM EDT - Sustained perf/$ over the whole program
05:02PM EDT - better packing of experiments = 2x utilization
05:01PM EDT - Experiment manager -reorder the workscale to improve basic metrics
05:01PM EDT - Big stack can product lots of opportunity but also lots of waste
05:01PM EDT - How much compute do you want? All of it!
05:01PM EDT - Infinite backlog of work
05:00PM EDT - Research at scale is about people
05:00PM EDT - Focus on whole systems
05:00PM EDT - 1. Stop thinking about chips - think about systems
05:00PM EDT - Three opportunities for the road ahead
05:00PM EDT - Might be harder to make more fundamental breakthroughs without better hardware
04:59PM EDT - Funding is still here, but not infinite
04:59PM EDT - Usign $$$ to meet the requirements
04:59PM EDT - Single CPU has stagnated
04:59PM EDT - Compute demand is 10x year
04:59PM EDT - Algorithms are more efficient now, but still not enough. Networks have been growing and growing
04:58PM EDT - Compute scales, intelligence is tougher
04:58PM EDT - Also allows for internal training consistency
04:57PM EDT - Simple answer is that scaling compute works! More search means better results, more iterations gives better results
04:57PM EDT - This needs a lot of compute
04:57PM EDT - Predicting reality and surpassing physically accurate simulations
04:56PM EDT - Simulate phenomena
04:56PM EDT - interactions processed through neural networks
04:56PM EDT - Interactions between particles -> edges in a graph
04:56PM EDT - Graph neural networks
04:55PM EDT - Building models of things like drug discovery etc is tough
04:55PM EDT - Allows for simple games, but also complex visual environments
04:55PM EDT - State trees are predicted imaginary states
04:54PM EDT - Model Based Reinforcement learning - learning the effects of actions
04:53PM EDT - Expand search beyond Go for Chess, Shoji
04:53PM EDT - Pruning the search space is critical
04:53PM EDT - reinforcement learning helps build these training regimes and policies
04:53PM EDT - Policy network to indicate which are the best moves, to reduce search even further
04:52PM EDT - No need to analyse future board positions that are known to give losing games
04:52PM EDT - End up with a value network to explore branches in a search tree
04:52PM EDT - Value network - use likelihood of winning from a given position based on previous games known as future stones
04:51PM EDT - Very difficult to point score a given board position
04:51PM EDT - Here's a search tree for brute force search
04:51PM EDT - Simple game for structure, complex game to master
04:50PM EDT - Now for Go
04:50PM EDT - better and more diverse data to improve
04:50PM EDT - combined with policy optimization
04:50PM EDT - Iterative improvement in dataset quality over time
04:49PM EDT - Handling non-scriptable objects
04:49PM EDT - eventually outperform humans
04:49PM EDT - understanding failure is critical to learn good behaviour
04:49PM EDT - Need to train on clean examples but also bad data to observe failure
04:48PM EDT - Behaviour with adversarial environments
04:48PM EDT - Everything that the robot does with all this data is stored, and used for future iteration laernings
04:47PM EDT - Batch RL
04:47PM EDT - Protein folding or robotics can be difficult to decide how close you are to the goal, so learn programs that assign rewards from programs
04:46PM EDT - humans annotate random attempts to indicate where the rewards are
04:45PM EDT - Initiate with as good data as possible
04:45PM EDT - failed experiments, random policies, interferences
04:45PM EDT - Neverending storage
04:45PM EDT - Never throw any data away, no matter how bad it is
04:44PM EDT - Scale up reinforcement learning in robotics
04:44PM EDT - Reward Sketching - listing human preferences
04:43PM EDT - train networks against future values of themselves
04:43PM EDT - predicting which future states give the best reward
04:43PM EDT - future rewards exponentially decay
04:43PM EDT - How to measure success, as in the real world
04:42PM EDT - All about the value function
04:42PM EDT - Maximise total reward during lifetime of agent
04:41PM EDT - Make good decisions by learning from experience
04:41PM EDT - Reinforcement learning
04:41PM EDT - Networks are growing 3x per year on average
04:41PM EDT - More diverse data, bigger network, more compute, gives better results
04:40PM EDT - Iron Law of Deep Learning: More is More
04:40PM EDT - Supervised DL - inferring knowledge from observations
04:40PM EDT - generalise to apply to new interactions
04:40PM EDT - Recipes to train programs
04:39PM EDT - Machine Learning is about creating new knowledge, using the present klnowledge, to solve a large diversity of novel problem
04:39PM EDT - Performing human level or better
04:39PM EDT - Task that is unlikely to be solved by random interaction
04:38PM EDT - Sequences of low level actions
04:38PM EDT - 2019 - solving puzzles in the real world
04:38PM EDT - Some of the solutions are very human like
04:38PM EDT - Took four hours of training - minimum effort for maximum game
04:37PM EDT - Such as playing breakout with RL
04:37PM EDT - Physically accurate simulations as required
04:37PM EDT - Easy rules to test new approaches in parallel simulations
04:37PM EDT - Research using games
04:36PM EDT - Neuro-physical phenomena
04:36PM EDT - Neuroscience can act as a catalyst
04:36PM EDT - DM has a unique approach to AI
04:35PM EDT - Independent of Alphabet but backed by them
04:35PM EDT - Research Institute inside Alphabet, 400 researchers
04:35PM EDT - Deepmind - An Apollo Program for AI
04:34PM EDT - Intro to Deepmind
04:34PM EDT - Desire to build bigger and bigger machines
04:33PM EDT - No formal training in hardware or systems - purely only a software guy
04:33PM EDT - AI research at scale
04:27PM EDT - This will likely be an update to what’s going on at Deepmind (now owned by Alphabet) and what they’re planning for the future of AI. We might get some insight as to how the company is working with other departments inside Alphabet – it has been cited that Deepmind has used its algorithms to increase the efficiency of cooling inside Google’s datacenters, for example.
04:27PM EDT - Deepmind is the company that created the AlphaGo program that played professional Go champion Lee Sedol in 2016, with the final score of 4-1 in favor of the artificial intelligence.
04:26PM EDT - Keynote for Day 2 of Hot Chips is from Dan Belov of Deepmind
