Robots & AI en 2024

MetaAI nos dice que: “Optimistic science fiction typically imagines a future where humans create art and pursue fulfilling pastimes while AI-enabled robots handle dull or dangerous tasks. In contrast, the AI systems of today display increasingly sophisticated generative abilities on ostensible creative tasks. But where are the robots? This gap is known as Moravec’s paradox, the thesis that the hardest problems in AI involve sensorimotor skills, not abstract thought or reasoning. To put it another way, “The hard problems are easy, and the easy problems are hard.”

Mas o menos, donde estamos?

Si nos ponemos a ver que esta pasando del lado del software, vamos a ver avances impresionantes de la mano de por ejemplo GPT4 y su habilidad de combinar conversación (Whisper), visión (GPT4-V), y generación de imágenes (DALLE-3). Si nos centramos únicamente en la generación (imágenes, videos, audio) podemos ver que no estamos muy lejos de poder tener películas enteras adaptadas a nuestros gustos siendo generadas automáticamente. Un buen ejemplo es este posteo en Reddit (ver “community note” también) o todo el revuelo que hubo alrededor de Emily Pellegrini, una reciente “AI influencer” (discusión algo más técnica acá).

it’s so over pic.twitter.com/KxzRLezgcL
— Justine Moore (@venturetwins) January 4, 2024

Del lado del hardware, o cosas que interactúan con el mundo físico, la historia es distinta. Hasta el año pasado y fuera de algunas publicaciones como ”Palm-SayCan” (Google + Everyday Robots), no se habia visto a robots que pudieran resolver tareas de forma realmente autónoma. Aunque las demos (impresionantes) de Boston Dynamics son mayormente tele-operadas, hace poco sacaron un prototipo que combinaba varios modelos para hacer que su robot dog (Spot) pudiera resolver algunas tareas en base a indicaciones de un humano. Abajo un poco mas sobre esto.

Diagram of the overall system

hardware setup for the tour guide

Spot EAP 2
Respeaker V2
Bluetooth Speaker
Spot Arm and gripper camera

Por otro lado lo tenemos a Tesla, que pasó de mostrar a un tipo disfrazado de robot bailando hace poco más de 2 años a poder hacer sorting autónomo de cosas hace unos meses a anunciar Optimus Gen-2 hace unas semanas.

There’s a new bot in town 🤖

Check this out (until the very end)!https://t.co/duFdhwNe3K pic.twitter.com/8pbhwW0WNc
— Tesla Optimus (@Tesla_Optimus) December 13, 2023

Finalmente, Figure (master-plan aca) acaba de aprender a hacer café viendo 10 horas de humanos haciendo café.

Figure-01 has learned to make coffee ☕️

Our AI learned this after watching humans make coffee

This is end-to-end AI: our neural networks are taking video in, trajectories out

Join us to train our robot fleet: https://t.co/egQy3iz3Ky pic.twitter.com/Y0ksEoHZsW
— Brett Adcock (@adcock_brett) January 7, 2024

Entonces, que hay en `open source`?

Estos dias, un equipo de 3 personas (Zipeng Fu, Tony Zhao, Chelsea Finn) de Google DeepMind y Stanford publicaron `Mobile-ALOHA“ “Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation” donde nos muestran un robot que con 50 demostraciones para cada tarea, aprendió a hacer, de forma autónoma, cosas como cocinar un camarón, abrir una alacena para guardar unas ollas pesadas, llamar y subirse a un ascensor, y lavar una olla usada usando una canilla. Todo esto funcionando en un robot que salió menos de $32,000.

Introduce 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Learning!

With 50 demos, our robot can autonomously complete complex mobile manipulation tasks:
- cook and serve shrimp🦐
- call and take elevator🛗
- store a 3Ibs pot to a two-door cabinet

Open-sourced!

Co-led @tonyzzhao, @chelseabfinn pic.twitter.com/wQ2BLDLhAw
— Zipeng Fu (@zipengfu) January 3, 2024

Una lista con algunos links útiles:

Esta investigación esta basada en su laburo previo con “ALOHA 🏖: 𝐀 𝐋ow-cost 𝐎pen-source 𝐇𝐀rdware System for Bimanual Teleoperation”.

Introducing ALOHA 🏖: 𝐀 𝐋ow-cost 𝐎pen-source 𝐇𝐀rdware System for Bimanual Teleoperation

After 8 months iterating @stanford and 2 months working with beta users, we are finally ready to release it!

Here is what ALOHA is capable of: pic.twitter.com/lR7gLOgwTZ
— Tony Z. Zhao (@tonyzzhao) March 27, 2023

Jim Fan (NVIDIA, OpenAI):

What did I tell you a few days ago? 2024 is the year of robotics. Mobile-ALOHA is an open-source robot hardware that can do dexterous, bimanual tasks like cooking a meal (with human teleoperation). Very soon, hardware will no longer bottleneck us on the quest for human-level,… pic.twitter.com/vMi3XkqKeh
— Jim Fan (@DrJimFan) January 4, 2024

Google DeepMind acaba de anunciar AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents, “a system that leverages existing foundation models to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision”. Hasta lo hacen basarse en las 3 leyes de Asimov para pensar como interactuar con el ambiente. Más en esta página.

These robots are coordinated by an LLM that decides what tasks to try, where to try it, how to obey a "constitution" with some "Asimov-inspired" laws. Check out more about AutoRT here: https://t.co/PitohaTbdt
And in the blog post: https://t.co/OaINc4DGln

Here is a thread 👇 pic.twitter.com/2Mk5AhtDdb
— Sergey Levine (@svlevine) January 5, 2024

Por otro lado, UT-Austin publicó research sobre imitation learning hace más de 2 años con VIOLA, “an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation.”

If you want to learn more about how the task has motivated a line of research in manipulation, see the list:
- VIOLA: https://t.co/g60edoV383
- HYDRA: https://t.co/nkOxBBnIxe
- AWE: https://t.co/VzbuLse0JI
- HITL-TAMP: https://t.co/lWDGlGDsLl
- MimicGen: https://t.co/bLU3lJtvWb https://t.co/roRcmyNbVF
— Yifeng Zhu 朱毅枫 (@yifengzhu_ut) January 7, 2024

Finalmente, MetaAI también publicó un artículo hace un tiempo llamado “Robots that learn from videos of human activities and simulated interactions” donde, en una colaboración con Boston Dynamics, presentan VC-1, un córtex visual artificial.

Contemporary discussion (hype?) about LLMs and “pausing AGI development” seems oblivious of Moravec’s paradox.

We’ve hypothesized since the 80s — that the hardest problems in AI involve sensorimotor control, not abstract thought or reasoning.

It https://t.co/kK7bkCDcq1… pic.twitter.com/OlFzhzsSbH
— Dhruv Batra (@DhruvBatraDB) March 31, 2023

Mas cositas para seguir leyendo

MetaAI presenta Robo-Affordances: “Robotics faces a chicken and egg problem: there is no web-scale robot data for training since robots are not yet deployed, and vice-versa. Our solution ( VRB ) is to use large-scale human videos to train a general-purpose affordance model to jumpstart any robotics setting”.

3 mo. ago we released the Open X-Embodiment dataset, today we’re doing the next step:
Introducing Octo 🐙, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source!
💻: https://t.co/5CGbqEqdn5

/🧵 pic.twitter.com/QMj8ChVr95
— Karl Pertsch (@KarlPertsch) December 14, 2023

Introducing Robo360 dataset 🚀, the first real-world omnispective multi-view and multi-material robotic manipulation dataset. Robo360 captures synchronized multi-modal robot-object interaction data (video, audio, proprioception, control) to facilitate research in dynamic… pic.twitter.com/6Tuw4uPR85
— Litian Liang (@litian_liang) January 5, 2024

Robots & AI en 2024

Mas o menos, donde estamos?

Entonces, que hay en open source?

Mas cositas para seguir leyendo

Entonces, que hay en `open source`?