Vision Encoder/Decoder Model for Image to Text

Insilico Medicine launches science MMAI gym to train frontier LLMs into pharmaceutical-grade scientific engines

New “AI GYM for Science” dramatically boosts the biological and chemical intelligence of any causal or frontier LLM, ...

GitHub

Android OCR Text Recognition Scanner – Optical Character Recognition for Android (ML Kit, Tesseract, Cloud Vision)

Whether you want to build a document scanner, digitize receipts, or add text recognition to your mobile app, this project is a perfect starting point. This project is provided for educational and ...

New Apple model combines vision understanding and image generation with impressive results

Manzano combines visual understanding and text-to-image generation, while significantly reducing performance or quality trade-offs.

10d

Apple AI research shows how MLLMs understand, generate, search for images

Apple's researchers continue to focus on multimodal LLMs, with studies exploring their use for image generation, ...

GitHub

SVG-T2I: Scaling up Text-to-Image Latent Diffusion Model

Important Note: This repository implements SVG-T2I, a text-to-image diffusion framework that performs visual generation directly in Visual Foundation Model (VFM) representation space, rather than ...

IEEE

SAM2-MDESD: An SAM2-Assisted Multilevel Dual Encoder–Single Decoder Method for Optical Remote Sensing Image Change Detection

Abstract: Given the limitations of traditional feature coding in capturing multiscale information and precise segmentation, existing deep learning-based change detection (CD) methods often suffer from ...

IEEE

SARCLIP: The First Vision–Language Foundation Model for SAR Image

Abstract: Foundation models have achieved remarkable breakthroughs across various domains, with the widely use of masked image modeling (MIM) and self-supervised learning (SSL). However, these models ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results