From fa1e600ec0d8444949507858ea10a056c096d5b3 Mon Sep 17 00:00:00 2001
From: yadonglu <yadonglu@micrsoft.com>
Date: Wed, 12 Feb 2025 23:45:41 -0800
Subject: [PATCH] add video

---
 README.md          | 4 ++--
 omnitool/readme.md | 5 +++++
 2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index eac686b..3034622 100644
--- a/README.md
+++ b/README.md
@@ -7,12 +7,12 @@
 [![arXiv](https://img.shields.io/badge/Paper-green)](https://arxiv.org/abs/2408.00203)
 [![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 
-📢 [[Project Page](https://microsoft.github.io/OmniParser/)] [[Blog Post](https://www.microsoft.com/en-us/research/articles/omniparser-for-pure-vision-based-gui-agent/)] [[Models V2](https://huggingface.co/microsoft/OmniParser-v2.0)] [[Models](https://huggingface.co/microsoft/OmniParser)] [[huggingface space](https://huggingface.co/spaces/microsoft/OmniParser)]
+📢 [[Project Page](https://microsoft.github.io/OmniParser/)] [[V2 Blog Post](https://www.microsoft.com/en-us/research/articles/omniparser-for-pure-vision-based-gui-agent/)] [[Models V2](https://huggingface.co/microsoft/OmniParser-v2.0)] [[Models](https://huggingface.co/microsoft/OmniParser)] [[huggingface space](https://huggingface.co/spaces/microsoft/OmniParser)]
 
 **OmniParser** is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface. 
 
 ## News
-- [2025/2] We release V2 [checkpoints](https://huggingface.co/microsoft/OmniParser-v2.0) 
+- [2025/2] We release OmniParser V2 [checkpoints](https://huggingface.co/microsoft/OmniParser-v2.0). 
 - [2025/2] We introduce OmniTool: Control a Windows 11 VM with OmniParser + your vision model of choice. OmniTool supports out of the box the following large language models - OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use. 
 - [2025/1] V2 is coming. We achieve new state of the art results 39.5% on the new grounding benchmark [Screen Spot Pro](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding/tree/main) with OmniParser v2 (will be released soon)! Read more details [here](https://github.com/microsoft/OmniParser/tree/master/docs/Evaluation.md).
 - [2024/11] We release an updated version, OmniParser V1.5 which features 1) more fine grained/small icon detection, 2) prediction of whether each screen element is interactable or not. Examples in the demo.ipynb. 
diff --git a/omnitool/readme.md b/omnitool/readme.md
index 82c215c..03e0d88 100644
--- a/omnitool/readme.md
+++ b/omnitool/readme.md
@@ -32,6 +32,11 @@ There are three components:
   </tr>
 </table>
 
+## Showcase Video
+| OmniParser V2   |    <video src="https://1drv.ms/v/c/650b027c18d5a573/EWXbVESKWo9Buu6OYCwg06wBeoM97C6EOTG6RjvWLEN1Qg?e=alnHGC" height="300" />    |
+| OmniTool   | <video src="https://1drv.ms/v/c/650b027c18d5a573/EehZ7RzY69ZHn-MeQHrnnR4BCj3by-cLLpUVlxMjF4O65Q?e=8LxMgX" height="300" />        |
+
+
 ## Notes:
 
 1. Though **OmniParser V2** can run on a CPU, we have separated this out if you want to run it fast on a GPU machine