Using the Hermes AI Model
At ai.unturf.com, we offer a free AI service powered by the model NousResearch/Hermes-3-Llama-3.1-8B. Our mission is to provide accessible AI tools for everyone, embodying the principles of both free as in beer & free as in freedom. You can interact with our model without any cost, and you are encouraged to contribute and build upon the open-source code & models that we use.
Installing the OpenAI Client
Python
To install the OpenAI package for Python, use pip
:
pip install openai
Node.js
To install the OpenAI package for Node.js, you can use npm
in your package.json
:
{
"dependencies": {
"openai": "^v4.67.3" // Use the latest version
}
}
Run the following command to install it:
npm install
Python Example
Non-Streaming
# Python Fizzbuzz Example
from openai import OpenAI
client = OpenAI(base_url="https://hermes.ai.unturf.com/v1", api_key="none")
MODEL = "NousResearch/Hermes-3-Llama-3.1-8B"
messages = [{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}]
response = client.chat.completions.create(
model=MODEL,
messages=messages,
temperature=0.5,
max_tokens=150
)
print(response.choices[0].message.content)
Streaming
# Streaming response in Python
from openai import OpenAI
client = OpenAI(base_url="https://hermes.ai.unturf.com/v1", api_key="none")
MODEL = "NousResearch/Hermes-3-Llama-3.1-8B"
messages = [
{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}
]
response = client.chat.completions.create(
model=MODEL,
messages=messages,
temperature=0.5,
max_tokens=150,
stream=True, # Enable streaming
)
for chunk in response:
if hasattr(chunk.choices[0].delta, "content"):
print(chunk.choices[0].delta.content, end="")
Node.js Example
Non-Streaming
const OpenAI = require('openai');
const client = new OpenAI({
baseURL: "https://hermes.ai.unturf.com/v1",
apiKey: "dummy-api-key",
});
const MODEL = "NousResearch/Hermes-3-Llama-3.1-8B";
const messages = [{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}];
async function getResponse() {
try {
const response = await client.chat.completions.create({
model: MODEL,
messages: messages,
temperature: 0.5,
max_tokens: 150,
});
console.log(response.choices[0].message.content);
} catch (error) {
console.error("Error:", error.response ? error.response.data : error.message);
}
}
getResponse();
Streaming
const OpenAI = require('openai');
const client = new OpenAI({
baseURL: "https://hermes.ai.unturf.com/v1",
apiKey: "dummy-api-key",
});
const MODEL = "NousResearch/Hermes-3-Llama-3.1-8B";
const messages = [{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}];
async function streamResponse() {
try {
const stream = await client.chat.completions.create({
model: MODEL,
messages: messages,
temperature: 0.5,
max_tokens: 150,
stream: true, // Enable streaming
});
// Use async iterator to read each chunk
for await (const chunk of stream) {
const msg = chunk.choices[0].delta.content;
process.stdout.write(msg); // Print each chunk as it arrives
}
} catch (error) {
console.error("Error:", error.response ? error.response.data : error.message);
}
}
streamResponse();
How we run inference if you wanted to try to contribute
We use vLLM to run models, currently full f16 safetensors. We make sure to use a virtualenv to hold the dependencies.
We are considering supporting ollama for better quant support.
Stand up a replica cluster on a new domain.
cd ~
python3 -m venv env
source env/bin/activate
pip install vllm
python -m vllm.entrypoints.openai.api_server --model NousResearch/Hermes-3-Llama-3.1-8B --host 0.0.0.0 --port 18888 --max-model-len 16000
If you want to see how we setup the proxy, check out /etc/caddy/Caddyfile
ai.unturf.com {
root * /opt/www
file_server
log {
output file /var/log/caddy/ai.unturf.com.log {
roll_size 50mb
roll_keep 5
}
}
tls {
on_demand
}
}
hermes.ai.unturf.com {
reverse_proxy :18888
log {
output file /var/log/caddy/hermes.ai.unturf.com.log {
roll_size 50mb
roll_keep 5
}
}
tls {
on_demand
}
}
We will likely implement a rate limit based on client IP address.
Client Side Only Example, chat with this page.
Because we don't use API keys we don't have any real need for a server.
view the page source, it's _all_ there!
TODO: a Javascript CDN bundle of the minimum viable web client for this demo will be hosted and provided free of charge.