Preparation
Follow the instructions at Using Xinference to setup Xinference and run thellama-2-chat
model.
Configuration

- API Host:
http://127.0.0.1:9997
- API Key: random strings
- Model:
llama-3-chat
Troubleshooting
- Only models with
chat
in their name are supported.