Imagine if you could simply tell your AI to complete a task – and it would decide for itself how and with which model it would achieve the result. This is now possible thanks to HuggingGPT, an innovative framework that uses large language models to coordinate different AI models and solve complex tasks, involving the HuggingFace community.
HuggingGPT uses ChatGPT to decompose user requests into actionable tasks and then connects various AI models from the Hugging Face platform to solve the scheduled tasks. It selects expert models such as T5, BERT and GPT-2 Large based on their descriptions.
HuggingGPT is an innovative framework that uses large language models (LLMs) to coordinate various AI models and solve complex tasks, leveraging the Hugginface community.
The working process of HuggingGPT can be divided into four phases:
- Task Planning: ChatGPT analyzes user requests to understand their intent and decompose them into possible solveable tasks via prompts.
- Model selection: Based on model descriptions, ChatGPT selects expert models hosted on Hugging Face to solve the scheduled tasks.
- Task Execution: Each selected model is called and executed, and the results are returned to ChatGPT.
- Answer generation: Finally, ChatGPT uses the predictions of all models to generate answers for users.
Some examples of how HuggingGPT works
- Example 1: HuggingGPT decomposes abstract user requests into concrete tasks such as pose detection, image captioning, and image production based on pose. It also detects dependencies between the tasks and uses the results of the dependent tasks to fill in the input arguments for the subsequent tasks.
- Example 2: HuggingGPT can conduct conversations in both audio and video formats. In both cases, it shows that HuggingGPT uses the expert models to perform the text-to-audio and text-to-video tasks requested by the user. It also organizes how models work together and how tasks depend on each other.
- Example 3: HuggingGPT integrates multiple user input resources to make simple inferences. It can decompose the main work into several basic tasks even if there are multiple resources, and then combine the results of different inferences from different models to find the correct answer.
HuggingGPT provides an open and continuous way to integrate various expert models without the need for heavy prompt engineering. It solves the challenge of collecting a large number of high-quality model descriptions necessary for solving numerous AI tasks that require coordinated collaboration among multiple AI models
And now Microsoft comes with Jarvis
Yes, exactly, Jarvis as in Iron Man – let’s see if it can keep the name. In any case, Microsoft’s Jarvis is based on “Hugging GPT”, which explores the use of large language models such as GPT-3.5 to interact with Hugging Faces Model Hub.
The question of “how close are we” to universal artificial intelligence (AI) is difficult to answer because experts’ opinions vary and the development of AI continues to progress. Even if Huggingface seems to be close: because this paper was published on March 30th: https://paperswithcode.com/paper/hugginggpt-solving-ai-tasks-with-chatgpt-and
HuggingGPT uses ChatGPT and accesses Huggingface’s entire database and selects the model automatically – independently and it learns in the process. So HuggingGPT uses language models (LLMs) like ChatGPT to solve complex AI tasks. It connects different AI models and uses chatGPT as an interface. So it “talks” to the models and plans and coordinates tasks, selects models based on their functions and carries out subtasks.
HuggingGPT has already been integrated into Microsoft: siehe auch hier
This new approach to AI could have a significant impact on the future of AI development and applications. Because Jarvis leverages the extensive collection of models to perform various tasks in different modalities such as speech, vision and speech.
Because Jarvis also works in a four-stage process:
- Task Scheduling: Based on the prompt, Jarvis plans the tasks to be performed.
- Model selection: Jarvis identifies the appropriate open source models from Hugging Faces Model Hub for each task.
- Task execution: The tasks are executed using the selected models.
- Response generation: Jarvis collects the results and generates an answer for the user.
The system allows Jarvis to handle complex, multimodal tasks that would normally require extensive custom programming or multiple custom AI systems.
The ability to connect powerful AI like GPT-3.5 with Hugging Faces Model Hub opens up a world of possibilities. Jarvis has the potential to enable applications for natural language processing, 3D image generation or stock trading.
Jarvis represents a significant step forward in AI development. By leveraging Hugging GPT and Hugging Faces Model Hub, Jarvis can handle a variety of tasks and revolutionize the way we interact with and use AI.
By the way, Microsoft has already provided system requirements and instructions to try out Jarvis. Let’s see who makes something out of it.