first commit

2024-09-20 15:49:30 -07:00
commit 6cd06a7a86
17 changed files with 2091 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,32 @@
+# OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent
+
+![Logo](imgs/logo.png)
+[![arXiv](https://img.shields.io/badge/Paper-green)](https://arxiv.org/abs/2408.00203)
+[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+
+**OmniParser** is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface. 
+
+## Examples:
+We put together a few simple examples in the demo.ipynb. 
+
+## Gradio Demo
+To run gradio demo, simply run:
+```python
+python gradion_demo.py
+```
+
+
+## 📚 Citation
+Our technical report can be found [here](https://arxiv.org/abs/2408.00203).
+If you find our work useful, please consider citing our work:
+```
+@misc{lu2024omniparserpurevisionbased,
+      title={OmniParser for Pure Vision Based GUI Agent}, 
+      author={Yadong Lu and Jianwei Yang and Yelong Shen and Ahmed Awadallah},
+      year={2024},
+      eprint={2408.00203},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2408.00203}, 
+}
+```