diff --git a/omnitool/gradio/app.py b/omnitool/gradio/app.py index 1b2c459..fdbd57c 100644 --- a/omnitool/gradio/app.py +++ b/omnitool/gradio/app.py @@ -26,6 +26,8 @@ CONFIG_DIR = Path("~/.anthropic").expanduser() API_KEY_FILE = CONFIG_DIR / "api_key" INTRO_TEXT = ''' +OmniTool Header + Welcome to OmniTool - the OmniParser+X Computer Use Demo! X = [OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use (Sonnet)]. OmniParser lets you turn any vision-langauge model into an AI agent. diff --git a/omnitool/readme.md b/omnitool/readme.md index a017a8c..ecda499 100644 --- a/omnitool/readme.md +++ b/omnitool/readme.md @@ -9,16 +9,16 @@ Control a Windows 11 VM with OmniParser+X (OpenAI (4o/o1/o3-mini), DeepSeek (R1) There are three components: - - - - - + + + + + @@ -26,31 +26,15 @@ There are three components:
omniboxA Windows 11 VM running in a Docker container.
omniparserserver FastAPI server running OmniParser V2.
omniboxA Windows 11 VM running in a Docker container.
gradio
- - Notes: -1. The Windows 11 VM docker is dependent on KVM so can only run quickly on Windows and Linux. This can run on a CPU machine (doesn't need GPU). -2. Though OmniParser can run on a CPU, we have separated this out if you want to run it fast on a GPU machine +1. Though OmniParser can run on a CPU, we have separated this out if you want to run it fast on a GPU machine +2. The Windows 11 VM docker is dependent on KVM so can only run quickly on Windows and Linux. This can run on a CPU machine (doesn't need GPU). 3. The Gradio UI can also run on a CPU machine. We suggest running **omnibox** and **gradio** on the same CPU machine and **omniparserserver** on a GPU server. ## Setup -1. **omnibox**: - - a. Install Docker Desktop - - b. Visit [Microsoft Evaluation Center](https://info.microsoft.com/ww-landing-windows-11-enterprise.html), accept the Terms of Service, and download a **Windows 11 Enterprise Evaluation (90-day trial, English, United States)** ISO file [~6GB]. Rename the file to `custom.iso` and copy it to the directory `OmniParser/omnitool/omnibox/vm/win11iso` - - c. Navigate to vm management script directory with`cd OmniParser/omnitool/omnibox/scripts` - - d. Build the docker container [400MB] and install the ISO to a storage folder [20GB] with `./manage_vm.sh create` - - e. After creating the first time it will store a save of the VM state in `vm/win11storage`. You can then manage the VM with `./manage_vm.sh start` and `./manage_vm.sh stop`. To delete the VM, use `./manage_vm.sh delete` and delete the `OmniParser/omnitool/omnibox/vm/win11storage` directory. - -2. **omniparserserver**: +1. **omniparserserver**: a. If you already have a conda environment for OmniParser, you can use that. Else follow the following steps to create one @@ -73,6 +57,18 @@ Notes: i. Start the server with `python -m omniparserserver` +2. **omnibox**: + + a. Install Docker Desktop + + b. Visit [Microsoft Evaluation Center](https://info.microsoft.com/ww-landing-windows-11-enterprise.html), accept the Terms of Service, and download a **Windows 11 Enterprise Evaluation (90-day trial, English, United States)** ISO file [~6GB]. Rename the file to `custom.iso` and copy it to the directory `OmniParser/omnitool/omnibox/vm/win11iso` + + c. Navigate to vm management script directory with`cd OmniParser/omnitool/omnibox/scripts` + + d. Build the docker container [400MB] and install the ISO to a storage folder [20GB] with `./manage_vm.sh create` + + e. After creating the first time it will store a save of the VM state in `vm/win11storage`. You can then manage the VM with `./manage_vm.sh start` and `./manage_vm.sh stop`. To delete the VM, use `./manage_vm.sh delete` and delete the `OmniParser/omnitool/omnibox/vm/win11storage` directory. + 3. **gradio**: a. Navigate to the gradio directory with `cd OmniParser/omnitool/gradio` @@ -81,4 +77,4 @@ Notes: c. Start the server with `python app.py --windows_host_url localhost:8006 --omniparser_server_url localhost:8000` - d. Open the URL in the terminal output, set your API Key from OpenAI and start playing with the AI agent! + d. Open the URL in the terminal output, set your API Key and start playing with the AI agent!