Cloud and On-Prem with Azure Relay Hybrid Connection
The purpose of Azure Relay Hybrid Connections is to be able to reach into our on premises application without the need to be poking holes into our firewall and leave our network infrastructure relatively untouched. It eliminates any kind of networking related complexities from our solution, reducing our overheads. The service is designed for TCP only traffic however.
Say if we wanted an app in Azure to send requests (via TCP) into our on-premises application to carry out some on-prem bound work or logic, Azure Relay Hybrid Connections is one good way of achieving this. In this blog we will be taking a look at Hybrid Connection as its own standalone service (rather than a service that lives within App Service).
Setup Hybrid Connection Listener
For our on premises application, we can set up a listener 'on-prem' application to serve up responses to requests. Here is an example that will listen to requests over the hybrid connection to offload those requests to Ollama with the Phi4-mini Small Language Model:
using Microsoft.Azure.Relay;
using TokenProvider = Microsoft.Azure.Relay.TokenProvider;
using System.Text;
using Newtonsoft.Json;
using System.Text.Json;
string KeyName = "{your_SASPolicy_key_name}"; //usually RootManageSharedAccessKey
string Key = "{your_key}";
var tokenProvider = TokenProvider.CreateSharedAccessSignatureTokenProvider(KeyName, Key);
var listener = new HybridConnectionListener(new Uri($"sb://{your_relay_namespace}.servicebus.windows.net/{your_hybridconnectionName}"), tokenProvider);
listener.RequestHandler = async (context) =>
{
Console.WriteLine("Received request!");
string apiResponse = string.Empty;
using (var reader = new StreamReader(context.Request.InputStream, Encoding.UTF8))
{
string body = await reader.ReadToEndAsync();
Console.WriteLine($"Received body: {body}");
apiResponse = await LLMResponse(body);
}
var response = Encoding.UTF8.GetBytes(apiResponse);
context.Response.StatusCode = System.Net.HttpStatusCode.OK;
context.Response.StatusDescription = "OK";
await context.Response.OutputStream.WriteAsync(response, 0, response.Length);
context.Response.Close();
};
listener.Online += (o, e) =>
{
Console.WriteLine("Listener is online.");
};
await listener.OpenAsync();
Console.WriteLine("Listener opened. Press ENTER to exit.");
Console.ReadLine();
await listener.CloseAsync();
//local LLM with Ollama
async Task<string> LLMResponse(string userInput)
{
HttpClient client = new HttpClient();
StringContent content =
new StringContent("{\"model\":\"phi4-mini:latest\", \"prompt\":\"" + userInput + "\", \"stream\": true}", Encoding.UTF8, "application/json");
string streamingAnswer = string.Empty;
var request =
new HttpRequestMessage(HttpMethod.Post, "http://localhost:11434/api/generate");
request.Content = content;
using (var response = await client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead))
{
var responseData = string.Empty;
response.EnsureSuccessStatusCode();
using (var stream = await response.Content.ReadAsStreamAsync())
using (var reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
{
var chunk = await reader.ReadLineAsync();
if (chunk != null)
{
var datachunk =
JsonConvert.DeserializeObject<ResponseModel>(chunk).Response;
Console.Write(datachunk);
responseData += datachunk;
}
}
}
return responseData;
}
}
public class ResponseModel
{
[JsonProperty("model")]
public string Model { get; set; }
[JsonProperty("created_at")]
public DateTime CreatedAt { get; set; }
[JsonProperty("response")]
public string Response { get; set; }
[JsonProperty("done")]
public bool Done { get; set; }
}
Setup Hybrid Connection Azure Function Caller
With our on premise application listening over the Hybrid Connection in Azure, we are now capable of having client applications in the cloud in Azure to reach back into it. We can achieve it this way by posting the request on the same Hybrid Connection instance, here for example with an Azure Function:
[Function("LLMInferencer")]
public async Task<IActionResult> Run([HttpTrigger(AuthorizationLevel.Function, "get", "post")] HttpRequest req)
{
string KeyName = "{your_SASPolicy_key_name}" //usually RootManageSharedAccessKey;
string Key = "{hybrid_connection_key}";
var tokenProvider = TokenProvider.CreateSharedAccessSignatureTokenProvider(KeyName, Key);
//use https here
var uri = new Uri($"https://{your_relay_namespace}.servicebus.windows.net/{your_hybridconnectionName}");
var token = (await tokenProvider.GetTokenAsync(uri.AbsoluteUri, TimeSpan.FromHours(1))).TokenString;
using (var client = new HttpClient())
{
try
{
string messageToSend = "can you simulate a quick conversation about the future of mobile and cloud communications between Satya Nadella and Sundar Pichai";
using (var request = new HttpRequestMessage(HttpMethod.Post, uri)) // Use POST or appropriate method
{
request.Headers.Add("ServiceBusAuthorization", token);
request.Content = new StringContent(messageToSend, Encoding.UTF8, "text/plain");
Console.WriteLine($"Sending POST request to {uri}...");
HttpResponseMessage response = await client.SendAsync(request);
Console.WriteLine($"Received status code: {response.StatusCode}");
if (response.IsSuccessStatusCode)
{
string responseBody = await response.Content.ReadAsStringAsync();
Console.WriteLine($"Response from server: {responseBody}");
return new ContentResult
{
Content = responseBody,
ContentType = "text/plain",
StatusCode = 200
};
}
else
{
string errorBody = await response.Content.ReadAsStringAsync();
Console.WriteLine($"Error response body: {errorBody}");
return new ContentResult
{
Content = errorBody,
ContentType = "text/plain",
StatusCode = 401
};
}
}
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
if (ex.InnerException != null)
{
Console.WriteLine($"Inner Exception: {ex.InnerException.Message}");
}
return new ContentResult
{
Content = ex.Message ,
ContentType = "text/plain",
StatusCode = 401
};
}
}
}
The Azure Function above can be run locally or deployed to Azure and what would happen is that the code 'listening' on the Azure Hybrid Connection (wherever in the world it might be!!) would receive the message and in this case would return a result from a local LLM running on the same compute instance/fabric.