From fa1e600ec0d8444949507858ea10a056c096d5b3 Mon Sep 17 00:00:00 2001 From: yadonglu Date: Wed, 12 Feb 2025 23:45:41 -0800 Subject: [PATCH] add video --- README.md | 4 ++-- omnitool/readme.md | 5 +++++ 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index eac686b..3034622 100644 --- a/README.md +++ b/README.md @@ -7,12 +7,12 @@ [![arXiv](https://img.shields.io/badge/Paper-green)](https://arxiv.org/abs/2408.00203) [![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) -📢 [[Project Page](https://microsoft.github.io/OmniParser/)] [[Blog Post](https://www.microsoft.com/en-us/research/articles/omniparser-for-pure-vision-based-gui-agent/)] [[Models V2](https://huggingface.co/microsoft/OmniParser-v2.0)] [[Models](https://huggingface.co/microsoft/OmniParser)] [[huggingface space](https://huggingface.co/spaces/microsoft/OmniParser)] +📢 [[Project Page](https://microsoft.github.io/OmniParser/)] [[V2 Blog Post](https://www.microsoft.com/en-us/research/articles/omniparser-for-pure-vision-based-gui-agent/)] [[Models V2](https://huggingface.co/microsoft/OmniParser-v2.0)] [[Models](https://huggingface.co/microsoft/OmniParser)] [[huggingface space](https://huggingface.co/spaces/microsoft/OmniParser)] **OmniParser** is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface. ## News -- [2025/2] We release V2 [checkpoints](https://huggingface.co/microsoft/OmniParser-v2.0) +- [2025/2] We release OmniParser V2 [checkpoints](https://huggingface.co/microsoft/OmniParser-v2.0). - [2025/2] We introduce OmniTool: Control a Windows 11 VM with OmniParser + your vision model of choice. OmniTool supports out of the box the following large language models - OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use. - [2025/1] V2 is coming. We achieve new state of the art results 39.5% on the new grounding benchmark [Screen Spot Pro](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding/tree/main) with OmniParser v2 (will be released soon)! Read more details [here](https://github.com/microsoft/OmniParser/tree/master/docs/Evaluation.md). - [2024/11] We release an updated version, OmniParser V1.5 which features 1) more fine grained/small icon detection, 2) prediction of whether each screen element is interactable or not. Examples in the demo.ipynb. diff --git a/omnitool/readme.md b/omnitool/readme.md index 82c215c..03e0d88 100644 --- a/omnitool/readme.md +++ b/omnitool/readme.md @@ -32,6 +32,11 @@ There are three components: +## Showcase Video +| OmniParser V2 |