Complete Book & Media Supply, LLC.

Back to Search

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

AUTHOR	Yang, Zhengyuan; Gan, Zhe; Li, Chunyuan
PUBLISHER	Now Publishers (05/06/2024)
PRODUCT TYPE	Paperback (Paperback)

Description

This monograph presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants.

The focus encompasses five core topics, categorized into two classes; (i) a survey of well-established research areas: multimodal foundation models pre-trained for specific purposes, including two topics - methods of learning vision backbones for visual understanding and text-to-image generation; (ii) recent advances in exploratory, open research areas: multimodal foundation models that aim to play the role of general-purpose assistants, including three topics - unified vision models inspired by large language models (LLMs), end-to-end training of multimodal LLMs, and chaining multimodal tools with LLMs.

The target audience of the monograph is researchers, graduate students, and professionals in computer vision and vision-language multimodal communities who are eager to learn the basics and recent advances in multimodal foundation models.

Product Format

Product Details

ISBN-13: 9781638283362

ISBN-10: 1638283362

Binding: Paperback or Softback (Trade Paperback (Us))

Content Language: English

More Product Details

Page Count: 230

Carton Quantity: 34

Product Dimensions: 6.14 x 0.48 x 9.21 inches

Weight: 0.72 pound(s)

Country of Origin: US

Subject Information

BISAC Categories

Computers | Software Development & Engineering - Computer Graphics

Computers | Artificial Intelligence - Computer Vision & Pattern Recognit

Computers | User Interfaces

Descriptions, Reviews, Etc.

publisher marketing

List Price $99.00

Your Price $98.01

Out of Stock

+ Receive Inventory Notifications

In Cart!

Paperback