The open AI platform Hugging Face is launching its next ambitious project: an open robotics environment. The scientist Remi Cadene is supposed to build it, as he announced on X (formerly Twitter). Previously, Cadene worked at Tesla on the development of the autopilot and the general-purpose humanoid robot Optimus. Scientifically, he is concerned with the mechanisms underlying intelligence and, in particular, with replicating human behavior using artificial neural networks. To this end, he researches new architectures, learning methods, theoretical frameworks and, in particular, approaches to making the results and decision-making processes of AI systems explainable.
Advertisement
Hugging Face has already advertised another position for the new project: It is looking for a robotics engineer who will develop a cost-effective robotics system based on open source and deep learning. His tasks will include, among other things, the development of algorithms for movement planning and control as well as perception and navigation.
The logical next step
Since its founding in 2016, Huggingface has grown into the platform par excellence for the development, exchange and benchmarking of open machine learning systems. The infrastructure is operated by the French-American company of the same name. An important service is the Transformers library, which provides Huggingface users with open source implementations of models for image, text and audio tasks. In 2021, Huggingface launched the BigScience initiative to develop an open source LLM (Large Language Model) to counterbalance the proprietary language AIs from OpenAI, Meta or Google. A year later, the network of independent researchers and smaller companies presented the 176 billion parameter model BLOOM.
With its planned robotics project, Hugging Face is taking the logical next step. Machine-trained systems are becoming increasingly multimodal, meaning they process more and more types of input data simultaneously and output them in different ways. Currently they primarily combine images and text, but audio, video and sensor data are also increasingly being incorporated into the training of multimodal models. The central building block of multimodal AIs are large language models such as GPT or LLaMA, because they have already internalized quite extensive world knowledge and can often apply it surprisingly well. This is why robotics is increasingly experimenting with LLMs, which are intended to translate short instructions from users into detailed flowcharts for robots.
(atr)