Introducing Hermes AI and TTS Speech Endpoints
At ai.unturf.com, we offer free AI services powered by the NousResearch/Hermes-3-Llama-3.1-8B model and a TTS (Text-to-Speech) endpoint. Our mission is to provide accessible AI tools for everyone, embodying the principles of both free as in beer & free as in freedom. You can interact with our models without any cost, and you are encouraged to contribute and build upon the open-source code & models that we use.
We intend to be a drop in replacement, you can use the existing open source OpenAI client to communicate with us.
Web Client-Only Solution: Interact with AI Services Directly from Static Sites or CDNs
Because we don't require a valid API key, we don't have any real need for a server.
Add this LLM to any static site or CDN.
This web client-only solution uses uncloseai.js to make the browser act as a client, directly interacting with the API without needing an intermediary server. By eliminating the need for a valid API key, the API handles requests on behalf of the browser client, making it efficient and accessible thin client, especially those on battery power like phones & laptops.
Installing uncloseai.js on Your Website
Add AI capabilities to any static website or web application with just one line of code. uncloseai.js automatically creates a floating "uncloseai." button that provides access to Hermes AI, text-to-speech, file upload, and more.
Quick Installation
Add this script tag to your HTML - that's it! The floating AI button appears automatically:
<script src="https://uncloseai.com/uncloseai.js" type="module"></script>
Configuration Options
To use your CSS framework's styling instead of uncloseai.js custom styles:
<script>
window.UNCLOSEAI_CUSTOM_STYLING = false;
</script>
To disable the floating button (if you only want the API functions):
<script>
</script>
Class-Based Integration
Embed AI features directly into your HTML using CSS classes and data attributes:
Full Interface: Complete AI chat with all features
<div class="uncloseai" data-features="full"></div>
Chat Only: Just the AI chat interface
<div class="uncloseai" data-features="chat"></div>
Specific Features: Choose which buttons to include
<div class="uncloseai" data-features="tts,upload,read"></div>
<div class="uncloseai" data-features="tts"></div>
Available Features: chat
, tts
, upload
, read
, full
What You Get
- 🤖 Floating AI Assistant - Always-accessible "uncloseai." button in bottom right
- 📖 Page-Aware - Hermes AI understands your page content and generates contextual introductions
- 🔊 Text-to-Speech - Convert any text to speech with multiple voice options
- 📁 File Upload - Upload and discuss documents, images, and more
- 💾 Conversation History - Persistent chat history with delete functionality
- 🎨 Framework Compatible - Works with PicoCSS, PureCSS, Bootstrap, and more
Live Demo: Try uncloseai.js in action here - see the floating button and AI features working on a real page!
Installing the OpenAI Client
Python
To install the OpenAI package for Python, use pip
:
pip install openai
Node.js
To install the OpenAI package for Node.js, you can use npm
in your package.json
:
{
"dependencies": {
"openai": "^v4.67.3" // Use the latest version
}
}
Run the following command to install it:
npm install
Using the Hermes AI Model
Python Example
Non-Streaming
# Python Fizzbuzz Example
from openai import OpenAI
client = OpenAI(base_url="https://hermes.ai.unturf.com/v1", api_key="choose-any-value")
#MODEL = "NousResearch/Hermes-3-Llama-3.1-8B"
MODEL = "adamo1139/Hermes-3-Llama-3.1-8B-FP8-Dynamic"
messages = [{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}]
response = client.chat.completions.create(
model=MODEL,
messages=messages,
temperature=0.5,
max_tokens=150
)
print(response.choices[0].message.content)
Streaming
# Streaming response in Python
from openai import OpenAI
client = OpenAI(base_url="https://hermes.ai.unturf.com/v1", api_key="choose-any-value")
MODEL = "adamo1139/Hermes-3-Llama-3.1-8B-FP8-Dynamic"
messages = [
{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}
]
response = client.chat.completions.create(
model=MODEL,
messages=messages,
temperature=0.5,
max_tokens=150,
stream=True, # Enable streaming
)
for chunk in response:
if hasattr(chunk.choices[0].delta, "content"):
print(chunk.choices[0].delta.content, end="")
Node.js Example
Non-Streaming
const OpenAI = require('openai');
const client = new OpenAI({
baseURL: "https://hermes.ai.unturf.com/v1",
apiKey: "dummy-api-key",
});
const MODEL = "adamo1139/Hermes-3-Llama-3.1-8B-FP8-Dynamic";
const messages = [{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}];
async function getResponse() {
try {
const response = await client.chat.completions.create({
model: MODEL,
messages: messages,
temperature: 0.5,
max_tokens: 150,
});
console.log(response.choices[0].message.content);
} catch (error) {
console.error("Error:", error.response ? error.response.data : error.message);
}
}
getResponse();
Streaming
const OpenAI = require('openai');
const client = new OpenAI({
baseURL: "https://hermes.ai.unturf.com/v1",
apiKey: "dummy-api-key",
});
const MODEL = "adamo1139/Hermes-3-Llama-3.1-8B-FP8-Dynamic";
const messages = [{"role": "user", "content": "Give a Python Fizzbuzz solution in one line of code?"}];
async function streamResponse() {
try {
const stream = await client.chat.completions.create({
model: MODEL,
messages: messages,
temperature: 0.5,
max_tokens: 150,
stream: true, // Enable streaming
});
// Use async iterator to read each chunk
for await (const chunk of stream) {
const msg = chunk.choices[0].delta.content;
process.stdout.write(msg); // Print each chunk as it arrives
}
} catch (error) {
console.error("Error:", error.response ? error.response.data : error.message);
}
}
streamResponse();
Using the Text To Speech Endpoint
Python Example
# TTS Speech Example in Python
import openai
client = openai.OpenAI(
api_key = "YOLO",
base_url = "https://speech.ai.unturf.com/v1",
)
with client.audio.speech.with_streaming_response.create(
model="tts-1",
voice="alloy",
speed=0.9,
input="I think so therefore, Today is a wonderful day to build something people love!"
) as response:
response.stream_to_file("speech.mp3")
Node.js Example
const OpenAI = require('openai');
const client = new OpenAI({
baseURL: "https://speech.ai.unturf.com/v1",
apiKey: "YOLO",
});
async function getSpeech() {
try {
const response = await client.audio.speech.with_streaming_response.create({
model: "tts-1",
voice: "alloy",
speed: 0.9,
input: "I think so therefore, Today is a wonderful day to build something people love!"
});
response.stream_to_file("speech.mp3");
} catch (error) {
console.error("Error:", error.response ? error.response.data : error.message);
}
}
getSpeech();
How we run inference
This section is optional. This is only if you wanted to try to contribute idle GPU time to the project or if you wanted to reproduce everything in your own cluster.
We use vLLM to run models, currently full f16 safetensors. We make sure to use a virtualenv to hold the dependencies.
We are considering supporting ollama for better quant support.
Stand up a replica cluster on a new domain.
sudo apt-get install gcc python3.12-dev
cd ~
python3 -m venv env
source env/bin/activate
pip install vllm
python -m vllm.entrypoints.openai.api_server --model adamo1139/Hermes-3-Llama-3.1-8B-FP8-Dynamic --host 0.0.0.0 --port 18888 --max-model-len 82000
The Speech endpoint or TTS uses openedai-speech running via Docker.
If you want to see how we setup the proxy, check out /etc/caddy/Caddyfile
ai.unturf.com {
root * /opt/www
file_server
log {
output file /var/log/caddy/ai.unturf.com.log {
roll_size 50mb
roll_keep 5
}
}
tls {
on_demand
}
}
hermes.ai.unturf.com {
reverse_proxy :18888
log {
output file /var/log/caddy/hermes.ai.unturf.com.log {
roll_size 50mb
roll_keep 5
}
}
tls {
on_demand
}
}
speech.ai.unturf.com {
reverse_proxy :8000
log {
output file /var/log/caddy/speech.ai.unturf.com.log {
roll_size 50mb
roll_keep 5
}
}
tls {
on_demand
}
}
We will likely implement a rate limit based on client IP address.