OmniTool
Control a Windows 11 VM with OmniParser+X (OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL)) or Anthropic Computer Use.
Overview
There are three components:
- omnibox: A Windows 11 VM running in a Docker container
- omniparserserver: FastAPI server running OmniParser V2
- gradio: UI where you can provide commands and watch OmniParser+X reasoning and executing on the Windows 11 VM
Notes:
- The Windows 11 VM docker is dependent on KVM so can only run quickly on Windows and Linux. This can run on a CPU machine (doesn't need GPU).
- Though OmniParser can run on a CPU, we have separated this out if you want to run it fast on a GPU machine
- The Gradio UI can also run on a CPU machine. We suggest running omnibox and gradio on the same CPU machine and omniparserserver on a GPU server.
Setup
-
omnibox:
a. Install Docker Desktop
b. Visit Microsoft Evaluation Center, accept the Terms of Service, and download a Windows 11 Enterprise Evaluation (90-day trial, English, United States) ISO file [~6GB]. Rename the file to
custom.isoand copy it to the directoryOmniParser/omnitool/omnibox/vm/win11isoc. Navigate to vm management script directory with
cd OmniParser/omnitool/omnibox/scriptsd. Build the docker container [400MB] and install the ISO to a storage folder [20GB] with
./manage_vm.sh createe. After creating the first time it will store a save of the VM state in
vm/win11storage. You can then manage the VM with./manage_vm.sh startand./manage_vm.sh stop. To delete the VM, use./manage_vm.sh deleteand delete theOmniParser/omnitool/omnibox/vm/win11storagedirectory. -
omniparserserver:
a. If you already have a conda environment for OmniParser, you can use that. Else follow the following steps to create one
b. Ensure conda is installed with
conda --versionor install from the Anaconda websitec. Navigate to the root of the repo with
cd OmniParserd. Create a conda python environment with
conda create -n "omni" python==3.12e. Set the python environment to be used with
conda activate omnif. Install the dependencies with
pip install -r requirements.txtg. Continue from here if you already had the conda environment.
h. Ensure you have the weights downloaded in weights folder. If not download them with:
for folder in icon_caption_florence icon_detect icon_detect_v1_5; do huggingface-cli download microsoft/OmniParser --local-dir weights/ --repo-type model --include "$folder/*"; doneh. Navigate to the server directory with
cd OmniParser/omnitool/omniparserserveri. Start the server with
python -m omniparserserver -
gradio:
a. Navigate to the gradio directory with
cd OmniParser/omnitool/gradiob. Ensure you have activated the conda python environment with
conda activate omnic. Start the server with
python app.py --windows_host_url localhost:8006 --omniparser_server_url localhost:8000d. Open the URL in the terminal output, set your API Key from OpenAI and start playing with the AI agent!
