unturf.

Welcome to ai.unturf.com - Free AI Service Powered by Hermes AI

Using the Hermes AI Model

At ai.unturf.com, we offer a free AI service powered by the model NousResearch/Hermes-3-Llama-3.1-8B. Our mission is to provide accessible AI tools for everyone, embodying the principles of both free as in beer & free as in freedom. You can interact with our model without any cost, and you are encouraged to contribute and build upon the open-source code & models that we use.

Installing the OpenAI Client

Python

To install the OpenAI package for Python, use pip:

pip install openai

Node.js

To install the OpenAI package for Node.js, you can use npm in your package.json:

{
  "dependencies": {
    "openai": "^v4.67.3"  // Use the latest version
  }
}
        

Run the following command to install it:

npm install

Python Example

Non-Streaming

# Python Fizzbuzz Example
from openai import OpenAI

client = OpenAI(base_url="https://hermes.ai.unturf.com/v1", api_key="none")

MODEL = "NousResearch/Hermes-3-Llama-3.1-8B"

messages = [{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}]

response = client.chat.completions.create(
    model=MODEL,
    messages=messages,
    temperature=0.5,
    max_tokens=150
)

print(response.choices[0].message.content)
        

Streaming

# Streaming response in Python
from openai import OpenAI

client = OpenAI(base_url="https://hermes.ai.unturf.com/v1", api_key="none")

MODEL = "NousResearch/Hermes-3-Llama-3.1-8B"

messages = [
    {"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}
]

response = client.chat.completions.create(
    model=MODEL,
    messages=messages,
    temperature=0.5,
    max_tokens=150,
    stream=True,  # Enable streaming
)

for chunk in response:
    if hasattr(chunk.choices[0].delta, "content"):
        print(chunk.choices[0].delta.content, end="")
        

Node.js Example

Non-Streaming

const OpenAI = require('openai');

const client = new OpenAI({
    baseURL: "https://hermes.ai.unturf.com/v1",
    apiKey: "dummy-api-key",
});

const MODEL = "NousResearch/Hermes-3-Llama-3.1-8B";

const messages = [{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}];

async function getResponse() {
    try {
        const response = await client.chat.completions.create({
            model: MODEL,
            messages: messages,
            temperature: 0.5,
            max_tokens: 150,
        });

        console.log(response.choices[0].message.content);
    } catch (error) {
        console.error("Error:", error.response ? error.response.data : error.message);
    }
}

getResponse();
        

Streaming


const OpenAI = require('openai');

const client = new OpenAI({
    baseURL: "https://hermes.ai.unturf.com/v1",
    apiKey: "dummy-api-key",
});

const MODEL = "NousResearch/Hermes-3-Llama-3.1-8B";

const messages = [{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}];

async function streamResponse() {
    try {
        const stream = await client.chat.completions.create({
            model: MODEL,
            messages: messages,
            temperature: 0.5,
            max_tokens: 150,
            stream: true,  // Enable streaming
        });

        // Use async iterator to read each chunk
        for await (const chunk of stream) {
            const msg = chunk.choices[0].delta.content;
            process.stdout.write(msg);  // Print each chunk as it arrives
        }
    } catch (error) {
        console.error("Error:", error.response ? error.response.data : error.message);
    }
}

streamResponse();
        

How we run inference if you wanted to try to contribute

We use vLLM to run models, currently full f16 safetensors. We make sure to use a virtualenv to hold the dependencies.

We are considering supporting ollama for better quant support.

Stand up a replica cluster on a new domain.


cd ~
python3 -m venv env
source env/bin/activate
pip install vllm
python -m vllm.entrypoints.openai.api_server --model NousResearch/Hermes-3-Llama-3.1-8B --host 0.0.0.0 --port 18888 --max-model-len 16000
        

If you want to see how we setup the proxy, check out /etc/caddy/Caddyfile



ai.unturf.com {
    root * /opt/www
    file_server
    log {
        output file /var/log/caddy/ai.unturf.com.log {
            roll_size 50mb
            roll_keep 5
        }
    }
    tls {
        on_demand
    }
}

hermes.ai.unturf.com {
    reverse_proxy :18888
    log {
        output file /var/log/caddy/hermes.ai.unturf.com.log {
            roll_size 50mb
            roll_keep 5
        }
    }
    tls {
        on_demand
    }
}

	

We will likely implement a rate limit based on client IP address.

Client Side Only Example, chat with this page.

Because we don't use API keys we don't have any real need for a server.

view the page source, it's _all_ there!

TODO: a Javascript CDN bundle of the minimum viable web client for this demo will be hosted and provided free of charge.

Questions, Comments, Discussions