readme + gradio updates

This commit is contained in:
Thomas Dhome-Casanova
2025-02-04 11:57:52 -08:00
parent 61999cef39
commit 07effd1a68
2 changed files with 23 additions and 25 deletions

View File

@@ -26,6 +26,8 @@ CONFIG_DIR = Path("~/.anthropic").expanduser()
API_KEY_FILE = CONFIG_DIR / "api_key" API_KEY_FILE = CONFIG_DIR / "api_key"
INTRO_TEXT = ''' INTRO_TEXT = '''
<img src="../../imgs/header_bar.png" alt="OmniTool Header" width="100%">
Welcome to OmniTool - the OmniParser+X Computer Use Demo! X = [OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use (Sonnet)]. Welcome to OmniTool - the OmniParser+X Computer Use Demo! X = [OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use (Sonnet)].
OmniParser lets you turn any vision-langauge model into an AI agent. OmniParser lets you turn any vision-langauge model into an AI agent.

View File

@@ -9,16 +9,16 @@ Control a Windows 11 VM with OmniParser+X (OpenAI (4o/o1/o3-mini), DeepSeek (R1)
There are three components: There are three components:
<table style="border-collapse: collapse; border: none;"> <table style="border-collapse: collapse; border: none;">
<tr>
<td style="border: none;"><img src="../imgs/omniboxicon.png" width="50"></td>
<td style="border: none;"><strong>omnibox</strong></td>
<td style="border: none;">A Windows 11 VM running in a Docker container.</td>
</tr>
<tr> <tr>
<td style="border: none;"><img src="../imgs/omniparsericon.png" width="50"></td> <td style="border: none;"><img src="../imgs/omniparsericon.png" width="50"></td>
<td style="border: none;"><strong>omniparserserver</strong></td> <td style="border: none;"><strong>omniparserserver</strong></td>
<td style="border: none;">FastAPI server running OmniParser V2.</td> <td style="border: none;">FastAPI server running OmniParser V2.</td>
</tr> </tr>
<tr>
<td style="border: none;"><img src="../imgs/omniboxicon.png" width="50"></td>
<td style="border: none;"><strong>omnibox</strong></td>
<td style="border: none;">A Windows 11 VM running in a Docker container.</td>
</tr>
<tr> <tr>
<td style="border: none;"><img src="../imgs/gradioicon.png" width="50"></td> <td style="border: none;"><img src="../imgs/gradioicon.png" width="50"></td>
<td style="border: none;"><strong>gradio</strong></td> <td style="border: none;"><strong>gradio</strong></td>
@@ -26,31 +26,15 @@ There are three components:
</tr> </tr>
</table> </table>
<!-- 1. **omnibox**: A Windows 11 VM running in a Docker container
2. **omniparserserver**: FastAPI server running OmniParser V2
3. **gradio**: UI where you can provide commands and watch OmniParser+X reasoning and executing on the Windows 11 VM -->
Notes: Notes:
1. The Windows 11 VM docker is dependent on KVM so can only run quickly on Windows and Linux. This can run on a CPU machine (doesn't need GPU). 1. Though OmniParser can run on a CPU, we have separated this out if you want to run it fast on a GPU machine
2. Though OmniParser can run on a CPU, we have separated this out if you want to run it fast on a GPU machine 2. The Windows 11 VM docker is dependent on KVM so can only run quickly on Windows and Linux. This can run on a CPU machine (doesn't need GPU).
3. The Gradio UI can also run on a CPU machine. We suggest running **omnibox** and **gradio** on the same CPU machine and **omniparserserver** on a GPU server. 3. The Gradio UI can also run on a CPU machine. We suggest running **omnibox** and **gradio** on the same CPU machine and **omniparserserver** on a GPU server.
## Setup ## Setup
1. **omnibox**: 1. **omniparserserver**:
a. Install Docker Desktop
b. Visit [Microsoft Evaluation Center](https://info.microsoft.com/ww-landing-windows-11-enterprise.html), accept the Terms of Service, and download a **Windows 11 Enterprise Evaluation (90-day trial, English, United States)** ISO file [~6GB]. Rename the file to `custom.iso` and copy it to the directory `OmniParser/omnitool/omnibox/vm/win11iso`
c. Navigate to vm management script directory with`cd OmniParser/omnitool/omnibox/scripts`
d. Build the docker container [400MB] and install the ISO to a storage folder [20GB] with `./manage_vm.sh create`
e. After creating the first time it will store a save of the VM state in `vm/win11storage`. You can then manage the VM with `./manage_vm.sh start` and `./manage_vm.sh stop`. To delete the VM, use `./manage_vm.sh delete` and delete the `OmniParser/omnitool/omnibox/vm/win11storage` directory.
2. **omniparserserver**:
a. If you already have a conda environment for OmniParser, you can use that. Else follow the following steps to create one a. If you already have a conda environment for OmniParser, you can use that. Else follow the following steps to create one
@@ -73,6 +57,18 @@ Notes:
i. Start the server with `python -m omniparserserver` i. Start the server with `python -m omniparserserver`
2. **omnibox**:
a. Install Docker Desktop
b. Visit [Microsoft Evaluation Center](https://info.microsoft.com/ww-landing-windows-11-enterprise.html), accept the Terms of Service, and download a **Windows 11 Enterprise Evaluation (90-day trial, English, United States)** ISO file [~6GB]. Rename the file to `custom.iso` and copy it to the directory `OmniParser/omnitool/omnibox/vm/win11iso`
c. Navigate to vm management script directory with`cd OmniParser/omnitool/omnibox/scripts`
d. Build the docker container [400MB] and install the ISO to a storage folder [20GB] with `./manage_vm.sh create`
e. After creating the first time it will store a save of the VM state in `vm/win11storage`. You can then manage the VM with `./manage_vm.sh start` and `./manage_vm.sh stop`. To delete the VM, use `./manage_vm.sh delete` and delete the `OmniParser/omnitool/omnibox/vm/win11storage` directory.
3. **gradio**: 3. **gradio**:
a. Navigate to the gradio directory with `cd OmniParser/omnitool/gradio` a. Navigate to the gradio directory with `cd OmniParser/omnitool/gradio`
@@ -81,4 +77,4 @@ Notes:
c. Start the server with `python app.py --windows_host_url localhost:8006 --omniparser_server_url localhost:8000` c. Start the server with `python app.py --windows_host_url localhost:8006 --omniparser_server_url localhost:8000`
d. Open the URL in the terminal output, set your API Key from OpenAI and start playing with the AI agent! d. Open the URL in the terminal output, set your API Key and start playing with the AI agent!