diff --git a/omnitool/readme.md b/omnitool/readme.md index 3a501aa..2a8b529 100644 --- a/omnitool/readme.md +++ b/omnitool/readme.md @@ -2,7 +2,13 @@ # OmniTool -Control a Windows 11 VM with OmniParser+X (OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL)) or Anthropic Computer Use. +Control a Windows 11 VM with OmniParser + your vision model of choice. + +## Highlights: + +1. **OmniParser V2** is 60% faster than V1 and now understands a wide variety of OS, app and inside app icons! +2. **OmniBox** uses 50% less disk space than other Windows VMs for agent testing, whilst providing the same computer use API +3. **OmniTool** supports out of the box the following vision models - OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use ## Overview @@ -26,11 +32,8 @@ There are three components: -## Highlights: -1. **OmniParser V2** is 60% faster than V1 and now understands a wide variety of OS, app and inside app icons! -2. **OmniBox** uses 50% less disk space than other Windows VMs for agent testing, whilst providing the same computer use API - ## Notes: + 1. Though **OmniParser V2** can run on a CPU, we have separated this out if you want to run it fast on a GPU machine 2. The **OmniBox** Windows 11 VM docker is dependent on KVM so can only run quickly on Windows and Linux. This can run on a CPU machine (doesn't need GPU). 3. The Gradio UI can also run on a CPU machine. We suggest running **omnibox** and **gradio** on the same CPU machine and **omniparserserver** on a GPU server.