Naming conventions

This commit is contained in:
Thomas Dhome-Casanova
2025-02-04 11:43:36 -08:00
parent 31d7b1d096
commit fe84a35292
39 changed files with 35 additions and 33 deletions

BIN
imgs/header_bar.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 210 KiB

View File

@@ -26,11 +26,11 @@ CONFIG_DIR = Path("~/.anthropic").expanduser()
API_KEY_FILE = CONFIG_DIR / "api_key"
INTRO_TEXT = '''
🚀🤖 It's Play Time!
Welcome to OmniTool - the OmniParser+X Computer Use Demo! X = [OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use (Sonnet)].
Welcome to the OmniParser+X Computer Use Demo! X = [GPT family (4o/o1/o3-mini), Claude, deepseek R1/V3, Qwen-2.5VL]. Let OmniParser turn your general purpose vision-langauge model to an AI agent.
OmniParser lets you turn any vision-langauge model into an AI agent.
Type a message and press submit to start OmniParser+X. Press stop to pause, and press the trash icon in the chat to clear the message history.
Type a message and press submit to start OmniTool. Press stop to pause, and press the trash icon in the chat to clear the message history.
'''
def parse_arguments():
@@ -271,7 +271,7 @@ with gr.Blocks(theme=gr.themes.Default()) as demo:
setup_state(state.value)
gr.Markdown("# OmniParser + ✖️ Demo")
gr.Markdown("# OmniTool")
if not os.getenv("HIDE_WARNING", False):
gr.Markdown(INTRO_TEXT)

View File

Before

Width:  |  Height:  |  Size: 3.1 KiB

After

Width:  |  Height:  |  Size: 3.1 KiB

View File

@@ -1,66 +1,68 @@
# OmniParser+X Computer Use Demo
Control a Windows 11 VM with OmniParser+X (X = [GPT family (4o/o1/o3-mini), Claude, deepseek R1/V3, Qwen-2.5VL]).
<p align="center">
<img src="../imgs/som_overlaid_omni.png" alt="OmniParser+X Computer Use Demo screenshot">
<img src="../imgs/header_bar.png" alt="OmniParser+X Computer Use Demo screenshot">
</p>
# OmniTool
Control a Windows 11 VM with OmniParser+X (OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL)) or Anthropic Computer Use.
## Overview
There are three components:
1. **windowshost**: A Windows 11 VM running in a Docker container
2. **omniparserserver**: FastAPI server running OmniParser
1. **omnibox**: A Windows 11 VM running in a Docker container
2. **omniparserserver**: FastAPI server running OmniParser V2
3. **gradio**: UI where you can provide commands and watch OmniParser+X reasoning and executing on the Windows 11 VM
Notes:
1. The Windows 11 VM docker is dependent on KVM so can only run quickly on Windows and Linux. This can run on a CPU machine (doesn't need GPU).
2. Though OmniParser can run on a CPU, we have separated this out if you want to run it fast on a GPU machine
3. The Gradio UI can also run on a CPU machine.
3. The Gradio UI can also run on a CPU machine. We suggest running **omnibox** and **gradio** on the same CPU machine and **omniparserserver** on a GPU server.
## Setup
1. **windowshost**:
1. **omnibox**:
a. Install Docker Desktop
b. Visit [Microsoft Evaluation Center](https://info.microsoft.com/ww-landing-windows-11-enterprise.html), accept the Terms of Service, and download a **Windows 11 Enterprise Evaluation (90-day trial, English, United States)** ISO file [~6GB]. Rename the file to `custom.iso` and copy it to the directory `OmniParser/computer_use_demo/windowshost/vm/win11iso`
c. Navigate to vm management script directory with`cd OmniParser/computer_use_demo/windowshost/scripts`
b. Visit [Microsoft Evaluation Center](https://info.microsoft.com/ww-landing-windows-11-enterprise.html), accept the Terms of Service, and download a **Windows 11 Enterprise Evaluation (90-day trial, English, United States)** ISO file [~6GB]. Rename the file to `custom.iso` and copy it to the directory `OmniParser/omnitool/omnibox/vm/win11iso`
c. Navigate to vm management script directory with`cd OmniParser/omnitool/omnibox/scripts`
d. Build the docker container [400MB] and install the ISO to a storage folder [20GB] with `./manage_vm.sh create`
e. After creating the first time it will store a save of the VM state in `vm/win11storage`. You can then manage the VM with `./manage_vm.sh start` and `./manage_vm.sh stop`. To delete the VM, use `./manage_vm.sh delete` and delete the `OmniParser/computer_use_demo/windowshost/vm/win11storage` directory.
e. After creating the first time it will store a save of the VM state in `vm/win11storage`. You can then manage the VM with `./manage_vm.sh start` and `./manage_vm.sh stop`. To delete the VM, use `./manage_vm.sh delete` and delete the `OmniParser/omnitool/omnibox/vm/win11storage` directory.
2. **omniparserserver**:
a. If you already have a conda environment for OmniParser, you can use that. Else follow the following steps to create one
b. Ensure conda is installed with `conda --version` or install from the [Anaconda website](https://www.anaconda.com/download/success)
c. Navigate to the root of the repo with `cd OmniParser`
d. Create a conda python environment with `conda create -n "omni" python==3.12`
e. Set the python environment to be used with `conda activate omni`
f. Install the dependencies with `pip install -r requirements.txt`
g. Continue from here if you already had the conda environment.
h. Ensure you have the weights downloaded in weights folder. If not download them with:
`for folder in icon_caption_florence icon_detect icon_detect_v1_5; do huggingface-cli download microsoft/OmniParser --local-dir weights/ --repo-type model --include "$folder/*"; done`
h. Navigate to the server directory with `cd OmniParser/computer_use_demo/omniparserserver`
h. Navigate to the server directory with `cd OmniParser/omnitool/omniparserserver`
i. Start the server with `python -m omniparserserver`
3. **gradio**:
a. Navigate to the gradio directory with `cd OmniParser/computer_use_demo/gradio`
a. Navigate to the gradio directory with `cd OmniParser/omnitool/gradio`
b. Ensure you have activated the conda python environment with `conda activate omni`
b. Ensure you have activated the conda python environment with `conda activate omni`
c. Start the server with `python app.py --windows_host_url localhost:8006 --omniparser_server_url localhost:8000`
c. Start the server with `python app.py --windows_host_url localhost:8006 --omniparser_server_url localhost:8000`
d. Open the URL in the terminal output, set your API Key from OpenAI and start playing with the AI agent!
d. Open the URL in the terminal output, set your API Key from OpenAI and start playing with the AI agent!