Using Free LLM & Text To Speech Artificial Intelligence Service

Introducing Hermes AI and TTS Speech Endpoints

At ai.unturf.com, we offer free AI services powered by the NousResearch/Hermes-3-Llama-3.1-8B model and a TTS (Text-to-Speech) endpoint. Our mission is to provide accessible AI tools for everyone, embodying the principles of both free as in beer & free as in freedom. You can interact with our models without any cost, and you are encouraged to contribute and build upon the open-source code & models that we use.

We intend to be a drop in replacement, you can use the existing open source OpenAI client to communicate with us.

Web Client-Only Solution: Interact with AI Services Directly from Static Sites or CDNs

Because we don't require a valid API key, we don't have any real need for a server.

Add this LLM to any static site or CDN.

This web client-only solution uses uncloseai.js to make the browser act as a client, directly interacting with the API without needing an intermediary server. By eliminating the need for a valid API key, the API handles requests on behalf of the browser client, making it efficient and accessible thin client, especially those on battery power like phones & laptops.

Installing uncloseai.js on Your Website

Add AI capabilities to any static website or web application with just one line of code. uncloseai.js automatically creates a floating "uncloseai." button that provides access to Hermes AI, text-to-speech, file upload, and more.

Quick Installation

Add this script tag to your HTML - that's it! The floating AI button appears automatically:

<script src="https://uncloseai.com/uncloseai.js" type="module"></script>

Configuration Options

To use your CSS framework's styling instead of uncloseai.js custom styles:

<script>
    window.UNCLOSEAI_CUSTOM_STYLING = false;
</script>

To disable the floating button (if you only want the API functions):

<script>
    
</script>

Class-Based Integration

Embed AI features directly into your HTML using CSS classes and data attributes:

Full Interface: Complete AI chat with all features

<div class="uncloseai" data-features="full"></div>

Chat Only: Just the AI chat interface

<div class="uncloseai" data-features="chat"></div>

Specific Features: Choose which buttons to include

<div class="uncloseai" data-features="tts,upload,read"></div>
<div class="uncloseai" data-features="tts"></div>

Available Features: chat, tts, upload, read, full

What You Get

🤖 Floating AI Assistant - Always-accessible "uncloseai." button in bottom right
📖 Page-Aware - Hermes AI understands your page content and generates contextual introductions
🔊 Text-to-Speech - Convert any text to speech with multiple voice options
📁 File Upload - Upload and discuss documents, images, and more
💾 Conversation History - Persistent chat history with delete functionality
🎨 Framework Compatible - Works with PicoCSS, PureCSS, Bootstrap, and more

Live Demo: Try uncloseai.js in action here - see the floating button and AI features working on a real page!

Installing the OpenAI Client

Python

To install the OpenAI package for Python, use pip:

pip install openai

Node.js

To install the OpenAI package for Node.js, you can use npm in your package.json:

{
  "dependencies": {
    "openai": "^v4.67.3"  // Use the latest version
  }
}

Run the following command to install it:

npm install

Using the Hermes AI Model

Python Example

Non-Streaming

# Python Fizzbuzz Example
from openai import OpenAI

client = OpenAI(base_url="https://hermes.ai.unturf.com/v1", api_key="choose-any-value")

#MODEL = "NousResearch/Hermes-3-Llama-3.1-8B"
MODEL = "adamo1139/Hermes-3-Llama-3.1-8B-FP8-Dynamic"

messages = [{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}]

response = client.chat.completions.create(
    model=MODEL,
    messages=messages,
    temperature=0.5,
    max_tokens=150
)

print(response.choices[0].message.content)

Streaming

# Streaming response in Python
from openai import OpenAI

client = OpenAI(base_url="https://hermes.ai.unturf.com/v1", api_key="choose-any-value")

MODEL = "adamo1139/Hermes-3-Llama-3.1-8B-FP8-Dynamic"

messages = [
    {"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}
]

response = client.chat.completions.create(
    model=MODEL,
    messages=messages,
    temperature=0.5,
    max_tokens=150,
    stream=True,  # Enable streaming
)

for chunk in response:
    if hasattr(chunk.choices[0].delta, "content"):
        print(chunk.choices[0].delta.content, end="")

Node.js Example

Non-Streaming

const OpenAI = require('openai');

const client = new OpenAI({
    baseURL: "https://hermes.ai.unturf.com/v1",
    apiKey: "dummy-api-key",
});

const MODEL = "adamo1139/Hermes-3-Llama-3.1-8B-FP8-Dynamic";

const messages = [{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}];

async function getResponse() {
    try {
        const response = await client.chat.completions.create({
            model: MODEL,
            messages: messages,
            temperature: 0.5,
            max_tokens: 150,
        });

        console.log(response.choices[0].message.content);
    } catch (error) {
        console.error("Error:", error.response ? error.response.data : error.message);
    }
}

getResponse();

Streaming


const OpenAI = require('openai');

const client = new OpenAI({
    baseURL: "https://hermes.ai.unturf.com/v1",
    apiKey: "dummy-api-key",
});

const MODEL = "adamo1139/Hermes-3-Llama-3.1-8B-FP8-Dynamic";

const messages = [{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}];

async function streamResponse() {
    try {
        const stream = await client.chat.completions.create({
            model: MODEL,
            messages: messages,
            temperature: 0.5,
            max_tokens: 150,
            stream: true,  // Enable streaming
        });

        // Use async iterator to read each chunk
        for await (const chunk of stream) {
            const msg = chunk.choices[0].delta.content;
            process.stdout.write(msg);  // Print each chunk as it arrives
        }
    } catch (error) {
        console.error("Error:", error.response ? error.response.data : error.message);
    }
}

streamResponse();

Using the Text To Speech Endpoint

Python Example

# TTS Speech Example in Python
import openai

client = openai.OpenAI(
  api_key = "YOLO",
  base_url = "https://speech.ai.unturf.com/v1",
)

with client.audio.speech.with_streaming_response.create(
  model="tts-1",
  voice="alloy",
  speed=0.9,
  input="I think so therefore, Today is a wonderful day to build something people love!"
) as response:
  response.stream_to_file("speech.mp3")

Node.js Example

const OpenAI = require('openai');

const client = new OpenAI({
    baseURL: "https://speech.ai.unturf.com/v1",
    apiKey: "YOLO",
});

async function getSpeech() {
    try {
        const response = await client.audio.speech.with_streaming_response.create({
            model: "tts-1",
            voice: "alloy",
            speed: 0.9,
            input: "I think so therefore, Today is a wonderful day to build something people love!"
        });

        response.stream_to_file("speech.mp3");
    } catch (error) {
        console.error("Error:", error.response ? error.response.data : error.message);
    }
}

getSpeech();

How we run inference

This section is optional. This is only if you wanted to try to contribute idle GPU time to the project or if you wanted to reproduce everything in your own cluster.

We use vLLM to run models, currently full f16 safetensors. We make sure to use a virtualenv to hold the dependencies.

We are considering supporting ollama for better quant support.

Stand up a replica cluster on a new domain.


sudo apt-get install gcc python3.12-dev
cd ~
python3 -m venv env
source env/bin/activate
pip install vllm
python -m vllm.entrypoints.openai.api_server --model  adamo1139/Hermes-3-Llama-3.1-8B-FP8-Dynamic --host 0.0.0.0 --port 18888 --max-model-len 82000

The Speech endpoint or TTS uses openedai-speech running via Docker.

If you want to see how we setup the proxy, check out /etc/caddy/Caddyfile



ai.unturf.com {
    root * /opt/www
    file_server
    log {
        output file /var/log/caddy/ai.unturf.com.log {
            roll_size 50mb
            roll_keep 5
        }
    }
    tls {
        on_demand
    }
}

hermes.ai.unturf.com {
    reverse_proxy :18888
    log {
        output file /var/log/caddy/hermes.ai.unturf.com.log {
            roll_size 50mb
            roll_keep 5
        }
    }
    tls {
        on_demand
    }
}

speech.ai.unturf.com {
    reverse_proxy :8000
    log {
        output file /var/log/caddy/speech.ai.unturf.com.log {
            roll_size 50mb
            roll_keep 5
        }
    }
    tls {
        on_demand
    }
}

We will likely implement a rate limit based on client IP address.

Questions & Comments & Discussions

Use the Remarkbox below to tell us what you think!