Back to all posts
development
USE CASES
Company
Infrastructure
Workflows
Feb 21, 2025

Why does vector search need object storage as its foundation?

Learn how to build a high-performance retrieval pipeline with UltiHash and Zilliz for fast multimodal search across text + images.

Vector search is changing the way we find information, particularly for unstructured data, by complementing traditional SQL-based queries when relying on similarity search rather than exact matches. Instead of just matching keywords, it uses embeddings—numerical representations that capture the meaning and context of a sentence. This allows you to search based on similarity, so you get results that are relevant even if the words don’t exactly match.

Think about image search: instead of looking for a specific file name, you can simply type “snowy mountain at sunset” and instantly see the most relevant results. That's because vector search organizes data in a way that truly understands what you're looking for, not just matching text.

But here’s the catch - vector databases don’t actually store the raw data. They index those embeddings, while the actual images, videos, or documents reside in object storage. That storage layer must be highly scalable and fast; otherwise, search performance suffers, costs rise, and scaling becomes a real headache.

Object storage is the best foundation for vector search because:

✅ Scales effortlessly - Handles billions of objects without a drop in performance.
✅ Separates storage & compute - Keeps search fast and costs under control.
✅ Built for retrieval - AI-powered search, recommendations, and discovery demand instant data access.

If vector search is the engine, object storage is the fuel. Without the right storage layer, retrieval slows down, costs spiral, and scaling becomes a problem.

Building a retrieval pipeline relies on the choice of storage and search tools matters. I picked UltiHash and Zilliz because they each solve a specific part of the problem.

  • UltiHash is a high-performance object storage, making it efficient to store and retrieve large datasets like images.
  • Zilliz is built for vector search - it’s also the managed version of Milvus (open-source). Since I'm using CLIP embeddings, I needed a database that could handle high-dimensional multi-modal search effectively.

With UltiHash storing the raw images and Zilliz handling the embeddings, this setup ensures efficient storage while enabling fast similarity search. Okay, theory aside, let’s test it… I documented the steps from my little experiment setting up vector search on an image dataset, using Zilliz as the vector DB and UltiHash as object storage.

Want to follow along with the code?
Check out this GitHub repo: https://github.com/UltiHash/vector-search-with-ultihash

Prerequisites

When setting up my vector search I started with the foundation. At first, I need a way to store and retrieve raw data efficiently, to enable fast, semantic search. The raw data - images, videos, or documents - isn’t stored inside the vector database itself. Instead, embeddings are indexed separately, while the actual files live in object storage. UltiHash works well here - it is scalable, I can deploy it with Docker or Kubernetes anywhere and it offers high throughput performance especially on read.

Before you begin, create a folder at /Users/ultihash/test (for storing images and metadata) or update the code with your own folder path.

A. UltiHash setup

First, I deployed an UltiHash cluster to store the raw images. I did this on my local machine, using the test version of UltiHash on Docker.

✅ Deploy my UltiHash cluster - Guide

✅ Store raw images in my UltiHash bucket ‘landscapes’ - Guide

B.1 Do you have your images’ embeddings ?

I used CLIP to generate embeddings for images, allowing them to be matched against text queries later.This step enables semantic search, where results are based on meaning rather than exact terms.

I prepared this script in case you need to extract embeddings from your data:

  1. Load images from  /Users/ultihash/test/landscapes_test
  2. Generate embeddings with CLIP
  3. Save the metadata (filenames + embeddings) in a JSON file /Users/ultihash/test/landscapes_metadata.json
#!/usr/bin/env python3
import os
import json
import torch
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
from pathlib import Path
from tqdm import tqdm

def main():
    # Define the input directory containing your landscape images.
    input_dir = Path("/Users/ultihash/test/landscapes_test")
    if not input_dir.exists():
        print(f"Input directory {input_dir} does not exist.")
        return

    # List only image files with supported extensions.
    supported_extensions = {".jpg", ".jpeg", ".png", ".bmp"}
    image_files = [p for p in input_dir.glob("*") if p.suffix.lower() in supported_extensions]
    if not image_files:
        print("No image files found in the directory.")
        return

    print(f"Found {len(image_files)} images in {input_dir}.")

    # Load the CLIP model and processor.
    # We use the 'openai/clip-vit-base-patch32' model for generating image embeddings.
    model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
    processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
    model.eval()  # Set model to evaluation mode
         
    metadata = []

    # Process each image (using a progress bar for feedback)
    for image_path in tqdm(image_files, desc="Processing images"):
        try:
            # Open the image and ensure it's in RGB mode.
            image = Image.open(image_path).convert("RGB")
        except Exception as e:
            print(f"Error opening image {image_path}: {e}")
            continue

        # Prepare the image for the model.
        inputs = processor(images=image, return_tensors="pt", padding=True)
        with torch.no_grad():
            # Generate the image embedding.
            image_features = model.get_image_features(**inputs)

        # Convert the tensor embedding to a list of floats.
                embedding = image_features[0].tolist()

        # Create a metadata entry containing the filename and the embedding.
        entry = {
            "filename": image_path.name,
            "embedding": embedding
        }
        metadata.append(entry)

    # Define the output JSON file path.
    output_file = Path("/Users/ultihash/test/landscapes_metadata.json")

    # Save the metadata to a JSON file with indentation for readability.
    with open(output_file, "w") as f:
        json.dump(metadata, f, indent=2)

    print(f"Metadata for {len(metadata)} images saved to {output_file}.")

if __name__ == "__main__":
    main()

B.2  Vector Database setup (Zilliz)

With raw images in UltiHash, the next step was indexing embeddings in a vector database. I used Zilliz Free Plan (a managed version of Milvus) to store and search embeddings efficiently.

Start my Zilliz cluster

Import my metadata collection (including embeddings)

Python Script: Get your metadata in the format requested by Zilliz

📌 What this script does:

  • Reads the original metadata file from /users/ultihash/test/landscapes_metadata.json
  • Removes the .jpg extension from filenames
  • Structures the data into Zilliz's required format
  • Saves the transformed metadata as /users/ultihash/test/landscape_metadata_col.json, ready for insertion into Zilliz
#!/usr/bin/env python3
import os
import json
from pathlib import Path

def main():
    # Define input and output file paths.
    input_file = Path("/users/ultihash/test/landscapes_metadata.json")
    output_file = Path("/users/ultihash/test/landscape_metadata_col.json")

    # Read the original metadata file.
    with open(input_file, "r") as f:
        metadata_list = json.load(f)

    # Transform the metadata entries.
    transformed_data = []
    for entry in metadata_list:
        # Remove the .jpg extension from the filename.
        filename = entry.get("filename", "")
        base_filename, _ = os.path.splitext(filename)

        transformed_entry = {
            "filename": base_filename,
            "embedding": entry.get("embedding", [])
        }
        transformed_data.append(transformed_entry)

    # Create the final structure for Zilliz insertion.
    zilliz_format = {
        "collectionName": "landscapes",
        "data": transformed_data
    }

    # Write the transformed metadata to the new JSON file.
    with open(output_file, "w") as f:
        json.dump(zilliz_format, f, indent=2)

    print(f"Transformed metadata saved to {output_file}")

if __name__ == "__main__":
    main()

C. Building a multimodal (text-image) retrieval pipeline

Now that I have my raw data stored on UltiHash and Zilliz handling the metadata and embeddings, I tested their integration. I used a flask API framework to orchestrate the queries from the user, to the vector database and to UltiHash. FlaskAPI is a python script that requires low effort and allows me to test that my retrieval pipeline works as intended.

1️⃣ Accepts a text query (e.g., "sunset at the beach")

2️⃣ Generates an ****embedding from the user’s text query using CLIP

3️⃣ Performs a vector search and identifies the top 3 similar embeddings that are stored in Zilliz (matching the query embedding with stored image embeddings)

4️⃣ Retrieves the 3 most similar matching images from UltiHash

Let’s break it down step by step.

1. Setting up the retrieval pipeline

📦 Imports & Dependencies

First, I imported the following libraries and dependencies. I included a short description of what each does.

#!/usr/bin/env python3
import os                                             # Handles file paths and environment variables
import sys                                            # Accesses system functions and command-line arguments
import json                                           # Reads/writes JSON data
import base64                                         # Encodes/decodes binary data as text
import torch                                          # Loads CLIP for text-to-embedding conversion
import argparse                                       # Parses command-line arguments
import boto3                                          # Retrieves images from UltiHash
from pathlib import Path                              # Works with file system paths
from flask import Flask, request, jsonify             # Framework to handle API requests
from pymilvus import connections, Collection          # Connects to Zilliz 
from transformers import CLIPProcessor, CLIPModel     # Loads CLIP for text-to-embedding conversion
from io import BytesIO                                # Handles image processing
from PIL import Image                                 # Handles image processing


2. Connecting to Zilliz (vector database)

After importing all the libraries required, I start by establishing a connection to Zilliz and to specify our collection name.

For that step, I’ll input my Zilliz URI and my Zilliz token (which are both found in My Cluster Details):

# ---------------------------
# Connect to Zilliz (Milvus)
# ---------------------------
connections.connect(
    alias="default",
    uri="https://<your-zilliz-uri>",   # Replace with your Zilliz cluster URI
    token="<your-zilliz-token>"        # Replace with your Zilliz access token
)
print("✅ Connected to Zilliz!")

collection_name = "landscapes"
collection = Collection(collection_name)  # Access the correct collection

3. Connecting to UltiHash (object storage for raw data)

Then, I setup the connection to UltiHash. It is S3 API compatible, so after importing boto3 (step 1), I just need to setup my boto3 S3 Client. For that, I need my  AWS_ACCESS_KEY_ID ,  AWS_SECRET_KEY  and my  UH_LICENSE_STRING (which I find in my UltiHash Dashboard). In this step I also need to define my bucket name.

# ---------------------------
# Set up boto3 S3 Client for UltiHash
# ---------------------------
s3 = boto3.client(
    's3',
    endpoint_url="http://127.0.0.1:8080",  # Adjust if necessary
    aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID"),
    aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY")
)

# Ensure API calls include the UltiHash license
def add_license_header(request, **kwargs):
    request.headers["UH_LICENSE_STRING"] = "<your-ultihash-license>"  # Replace with your license key

s3.meta.events.register("before-sign.s3", add_license_header)

bucket = "landscapes"  # Set the correct UltiHash bucket

4. Loading CLIP for querying

To run a vector search, I first need to generate embeddings from my text queries. Since my queries are text and the data I’m retrieving are images, I need a multimodal embedding model. For this, I chose CLIP: it will generate an embeddings from my text query which will be compared to my stored image embeddings in Zilliz.

# ---------------------------
# Load CLIP text encoder (for query vector generation)
# ---------------------------
clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
clip_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")


5. Querying Zilliz with vector search

To extract the embedding form my text query and enable a comparison with the embeddings stored in Zilliz to find the top 3 most similar vectors. I define the following function:

# ---------------------------
# Helper function: Query Zilliz with vector search
# ---------------------------
def query_landscape(query_text: str, top_k: int = 3):
    """
    Computes the CLIP text embedding for the given query and performs a vector search in Zilliz.

    Parameters:
      query_text (str): The text query (e.g. "mountain").
      top_k (int): The number of top results to retrieve (default 3).
    
    Returns:
      list: A list of filenames (without extension) from the best matching records.
    """
    # Compute the CLIP text embedding for the query text.
    inputs = clip_processor(text=[query_text], return_tensors="pt", padding=True)
    with torch.no_grad():
        query_vector = clip_model.get_text_features(**inputs)[0].tolist()

    # Define search parameters for cosine similarity.
    search_params = {"metric_type": "COSINE", "params": {"nprobe": 10}}

    try:
        search_results = collection.search(
            data=[query_vector],
            anns_field="embedding",
            param=search_params, 
            limit=top_k,
            # No additional filter here since we're doing a pure vector search.
            output_fields=["filename"]
        )
    except Exception as e: 
        print(f"Vector search error for query '{query_text}': {e}")
        return []
     
    filenames = []
       if search_results and len(search_results) > 0 and len(search_results[0]) > 0:
        for hit in search_results[0]:
            # Access the underlying entity. Depending on your pymilvus version, you might need to use attributes.
            filenames.append(hit.filename)  # Assuming hit.filename holds the stored filename (without extension)
            print(f"🔍 Retrieved filename from Zilliz: {hit.filename}")
    else:
        print(f"No results found for query '{query_text}'.")
    return filenames


6. Flask endpoint: handling API requests

I now need to build a route that accepts a user query, that will query the vector search in Zilliz, that will fetch the according data in UltiHash and that will show the user the answer to their query. For that I used the Flask framework:

# ---------------------------
# Initialize Flask Application
# ---------------------------
app = Flask(__name__)

# ---------------------------
# Endpoint: /get_landscape_images
# ---------------------------
@app.route("/get_landscape_images", methods=["POST"])
def get_landscape_images():
    """
    Expects a JSON payload:
    {
      "query": "mountain"
    }
    Performs a vector search in Zilliz for the given query text, retrieves the top 3 matching records,
    fetches the corresponding images from UltiHash (bucket: landscapes) using the filename with '.jpg',
    and returns the results with base64-encoded images.
    """
    data = request.get_json()
    if not data or "query" not in data:
        return jsonify({"error": "Missing 'query' in request."}), 400
        
    query_text = data["query"].strip()
    results = []
    target_dir = Path("/Users/ultihash/test/retrieval-test")
    target_dir.mkdir(parents=True, exist_ok=True)

    # Get top 3 matching filenames from Zilliz.
    filenames = query_landscape(query_text, top_k=3)
    if not filenames:
        return jsonify({"error": "No matching records found."}), 404

    for filename in filenames:
        # Append .jpg to build the expected key in UltiHash.
        file_key = f"{filename}.jpg"
        try:
            response = s3.get_object(Bucket=bucket, Key=file_key)
            file_data = response["Body"].read()
        except Exception as e:
            results.append({"filename": filename, "error": f"Failed to fetch image '{file_key}': {str(e)}"})
          continue

        try:
            # Open the image using Pillow directly from memory.
            image = Image.open(BytesIO(file_data))
            image.show()  # This will open the image using your default image viewer.
        except Exception as e:
            results.append({"filename": filename, "error": f"Failed to open image: {str(e)}"})
            continue
    
        results.append({
            "filename": filename,
            "message": "Image fetched and opened successfully."
        })
        print(f"Processed filename {filename} and opened image {file_key}")
        
    return jsonify({"results": results})            

7. Running the API

All I need to add to run the API is this line of code:

if __name__ == "__main__":
    app.run(host="127.0.0.1", port=5000, debug=True)


Wrapping it up

I now have a multimodal retrieval pipeline where:

✅ My raw images are stored in UltiHash.

✅ The images embeddings and metadata are indexed in Zilliz.

✅ The API takes a text query, performs vector search, and retrieves matching images.

You can now test it with a simple POST request:

curl -X POST -H "Content-Type: application/json" -d '{"query": "sunset at the beach"}' http://127.0.0.1:5000/get_landscape_images


Share this post:
Check this out:
Why does vector search need object storage as its foundation?
Learn how to build a high-performance retrieval pipeline with UltiHash and Zilliz for fast multimodal search across text + images.
Posted by
Juliette Lehmann
Founder's Associate
Build faster AI infrastructure with less storage resources
Get 10TB Free

Why does vector search need object storage as its foundation?

Juliette Lehmann
Learn how to build a high-performance retrieval pipeline with UltiHash and Zilliz for fast multimodal search across text + images.

Vector search is changing the way we find information, particularly for unstructured data, by complementing traditional SQL-based queries when relying on similarity search rather than exact matches. Instead of just matching keywords, it uses embeddings—numerical representations that capture the meaning and context of a sentence. This allows you to search based on similarity, so you get results that are relevant even if the words don’t exactly match.

Think about image search: instead of looking for a specific file name, you can simply type “snowy mountain at sunset” and instantly see the most relevant results. That's because vector search organizes data in a way that truly understands what you're looking for, not just matching text.

But here’s the catch - vector databases don’t actually store the raw data. They index those embeddings, while the actual images, videos, or documents reside in object storage. That storage layer must be highly scalable and fast; otherwise, search performance suffers, costs rise, and scaling becomes a real headache.

Object storage is the best foundation for vector search because:

✅ Scales effortlessly - Handles billions of objects without a drop in performance.
✅ Separates storage & compute - Keeps search fast and costs under control.
✅ Built for retrieval - AI-powered search, recommendations, and discovery demand instant data access.

If vector search is the engine, object storage is the fuel. Without the right storage layer, retrieval slows down, costs spiral, and scaling becomes a problem.

Building a retrieval pipeline relies on the choice of storage and search tools matters. I picked UltiHash and Zilliz because they each solve a specific part of the problem.

  • UltiHash is a high-performance object storage, making it efficient to store and retrieve large datasets like images.
  • Zilliz is built for vector search - it’s also the managed version of Milvus (open-source). Since I'm using CLIP embeddings, I needed a database that could handle high-dimensional multi-modal search effectively.

With UltiHash storing the raw images and Zilliz handling the embeddings, this setup ensures efficient storage while enabling fast similarity search. Okay, theory aside, let’s test it… I documented the steps from my little experiment setting up vector search on an image dataset, using Zilliz as the vector DB and UltiHash as object storage.

Want to follow along with the code?
Check out this GitHub repo: https://github.com/UltiHash/vector-search-with-ultihash

Prerequisites

When setting up my vector search I started with the foundation. At first, I need a way to store and retrieve raw data efficiently, to enable fast, semantic search. The raw data - images, videos, or documents - isn’t stored inside the vector database itself. Instead, embeddings are indexed separately, while the actual files live in object storage. UltiHash works well here - it is scalable, I can deploy it with Docker or Kubernetes anywhere and it offers high throughput performance especially on read.

Before you begin, create a folder at /Users/ultihash/test (for storing images and metadata) or update the code with your own folder path.

A. UltiHash setup

First, I deployed an UltiHash cluster to store the raw images. I did this on my local machine, using the test version of UltiHash on Docker.

✅ Deploy my UltiHash cluster - Guide

✅ Store raw images in my UltiHash bucket ‘landscapes’ - Guide

B.1 Do you have your images’ embeddings ?

I used CLIP to generate embeddings for images, allowing them to be matched against text queries later.This step enables semantic search, where results are based on meaning rather than exact terms.

I prepared this script in case you need to extract embeddings from your data:

  1. Load images from  /Users/ultihash/test/landscapes_test
  2. Generate embeddings with CLIP
  3. Save the metadata (filenames + embeddings) in a JSON file /Users/ultihash/test/landscapes_metadata.json
#!/usr/bin/env python3
import os
import json
import torch
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
from pathlib import Path
from tqdm import tqdm

def main():
    # Define the input directory containing your landscape images.
    input_dir = Path("/Users/ultihash/test/landscapes_test")
    if not input_dir.exists():
        print(f"Input directory {input_dir} does not exist.")
        return

    # List only image files with supported extensions.
    supported_extensions = {".jpg", ".jpeg", ".png", ".bmp"}
    image_files = [p for p in input_dir.glob("*") if p.suffix.lower() in supported_extensions]
    if not image_files:
        print("No image files found in the directory.")
        return

    print(f"Found {len(image_files)} images in {input_dir}.")

    # Load the CLIP model and processor.
    # We use the 'openai/clip-vit-base-patch32' model for generating image embeddings.
    model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
    processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
    model.eval()  # Set model to evaluation mode
         
    metadata = []

    # Process each image (using a progress bar for feedback)
    for image_path in tqdm(image_files, desc="Processing images"):
        try:
            # Open the image and ensure it's in RGB mode.
            image = Image.open(image_path).convert("RGB")
        except Exception as e:
            print(f"Error opening image {image_path}: {e}")
            continue

        # Prepare the image for the model.
        inputs = processor(images=image, return_tensors="pt", padding=True)
        with torch.no_grad():
            # Generate the image embedding.
            image_features = model.get_image_features(**inputs)

        # Convert the tensor embedding to a list of floats.
                embedding = image_features[0].tolist()

        # Create a metadata entry containing the filename and the embedding.
        entry = {
            "filename": image_path.name,
            "embedding": embedding
        }
        metadata.append(entry)

    # Define the output JSON file path.
    output_file = Path("/Users/ultihash/test/landscapes_metadata.json")

    # Save the metadata to a JSON file with indentation for readability.
    with open(output_file, "w") as f:
        json.dump(metadata, f, indent=2)

    print(f"Metadata for {len(metadata)} images saved to {output_file}.")

if __name__ == "__main__":
    main()

B.2  Vector Database setup (Zilliz)

With raw images in UltiHash, the next step was indexing embeddings in a vector database. I used Zilliz Free Plan (a managed version of Milvus) to store and search embeddings efficiently.

Start my Zilliz cluster

Import my metadata collection (including embeddings)

Python Script: Get your metadata in the format requested by Zilliz

📌 What this script does:

  • Reads the original metadata file from /users/ultihash/test/landscapes_metadata.json
  • Removes the .jpg extension from filenames
  • Structures the data into Zilliz's required format
  • Saves the transformed metadata as /users/ultihash/test/landscape_metadata_col.json, ready for insertion into Zilliz
#!/usr/bin/env python3
import os
import json
from pathlib import Path

def main():
    # Define input and output file paths.
    input_file = Path("/users/ultihash/test/landscapes_metadata.json")
    output_file = Path("/users/ultihash/test/landscape_metadata_col.json")

    # Read the original metadata file.
    with open(input_file, "r") as f:
        metadata_list = json.load(f)

    # Transform the metadata entries.
    transformed_data = []
    for entry in metadata_list:
        # Remove the .jpg extension from the filename.
        filename = entry.get("filename", "")
        base_filename, _ = os.path.splitext(filename)

        transformed_entry = {
            "filename": base_filename,
            "embedding": entry.get("embedding", [])
        }
        transformed_data.append(transformed_entry)

    # Create the final structure for Zilliz insertion.
    zilliz_format = {
        "collectionName": "landscapes",
        "data": transformed_data
    }

    # Write the transformed metadata to the new JSON file.
    with open(output_file, "w") as f:
        json.dump(zilliz_format, f, indent=2)

    print(f"Transformed metadata saved to {output_file}")

if __name__ == "__main__":
    main()

C. Building a multimodal (text-image) retrieval pipeline

Now that I have my raw data stored on UltiHash and Zilliz handling the metadata and embeddings, I tested their integration. I used a flask API framework to orchestrate the queries from the user, to the vector database and to UltiHash. FlaskAPI is a python script that requires low effort and allows me to test that my retrieval pipeline works as intended.

1️⃣ Accepts a text query (e.g., "sunset at the beach")

2️⃣ Generates an ****embedding from the user’s text query using CLIP

3️⃣ Performs a vector search and identifies the top 3 similar embeddings that are stored in Zilliz (matching the query embedding with stored image embeddings)

4️⃣ Retrieves the 3 most similar matching images from UltiHash

Let’s break it down step by step.

1. Setting up the retrieval pipeline

📦 Imports & Dependencies

First, I imported the following libraries and dependencies. I included a short description of what each does.

#!/usr/bin/env python3
import os                                             # Handles file paths and environment variables
import sys                                            # Accesses system functions and command-line arguments
import json                                           # Reads/writes JSON data
import base64                                         # Encodes/decodes binary data as text
import torch                                          # Loads CLIP for text-to-embedding conversion
import argparse                                       # Parses command-line arguments
import boto3                                          # Retrieves images from UltiHash
from pathlib import Path                              # Works with file system paths
from flask import Flask, request, jsonify             # Framework to handle API requests
from pymilvus import connections, Collection          # Connects to Zilliz 
from transformers import CLIPProcessor, CLIPModel     # Loads CLIP for text-to-embedding conversion
from io import BytesIO                                # Handles image processing
from PIL import Image                                 # Handles image processing


2. Connecting to Zilliz (vector database)

After importing all the libraries required, I start by establishing a connection to Zilliz and to specify our collection name.

For that step, I’ll input my Zilliz URI and my Zilliz token (which are both found in My Cluster Details):

# ---------------------------
# Connect to Zilliz (Milvus)
# ---------------------------
connections.connect(
    alias="default",
    uri="https://<your-zilliz-uri>",   # Replace with your Zilliz cluster URI
    token="<your-zilliz-token>"        # Replace with your Zilliz access token
)
print("✅ Connected to Zilliz!")

collection_name = "landscapes"
collection = Collection(collection_name)  # Access the correct collection

3. Connecting to UltiHash (object storage for raw data)

Then, I setup the connection to UltiHash. It is S3 API compatible, so after importing boto3 (step 1), I just need to setup my boto3 S3 Client. For that, I need my  AWS_ACCESS_KEY_ID ,  AWS_SECRET_KEY  and my  UH_LICENSE_STRING (which I find in my UltiHash Dashboard). In this step I also need to define my bucket name.

# ---------------------------
# Set up boto3 S3 Client for UltiHash
# ---------------------------
s3 = boto3.client(
    's3',
    endpoint_url="http://127.0.0.1:8080",  # Adjust if necessary
    aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID"),
    aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY")
)

# Ensure API calls include the UltiHash license
def add_license_header(request, **kwargs):
    request.headers["UH_LICENSE_STRING"] = "<your-ultihash-license>"  # Replace with your license key

s3.meta.events.register("before-sign.s3", add_license_header)

bucket = "landscapes"  # Set the correct UltiHash bucket

4. Loading CLIP for querying

To run a vector search, I first need to generate embeddings from my text queries. Since my queries are text and the data I’m retrieving are images, I need a multimodal embedding model. For this, I chose CLIP: it will generate an embeddings from my text query which will be compared to my stored image embeddings in Zilliz.

# ---------------------------
# Load CLIP text encoder (for query vector generation)
# ---------------------------
clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
clip_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")


5. Querying Zilliz with vector search

To extract the embedding form my text query and enable a comparison with the embeddings stored in Zilliz to find the top 3 most similar vectors. I define the following function:

# ---------------------------
# Helper function: Query Zilliz with vector search
# ---------------------------
def query_landscape(query_text: str, top_k: int = 3):
    """
    Computes the CLIP text embedding for the given query and performs a vector search in Zilliz.

    Parameters:
      query_text (str): The text query (e.g. "mountain").
      top_k (int): The number of top results to retrieve (default 3).
    
    Returns:
      list: A list of filenames (without extension) from the best matching records.
    """
    # Compute the CLIP text embedding for the query text.
    inputs = clip_processor(text=[query_text], return_tensors="pt", padding=True)
    with torch.no_grad():
        query_vector = clip_model.get_text_features(**inputs)[0].tolist()

    # Define search parameters for cosine similarity.
    search_params = {"metric_type": "COSINE", "params": {"nprobe": 10}}

    try:
        search_results = collection.search(
            data=[query_vector],
            anns_field="embedding",
            param=search_params, 
            limit=top_k,
            # No additional filter here since we're doing a pure vector search.
            output_fields=["filename"]
        )
    except Exception as e: 
        print(f"Vector search error for query '{query_text}': {e}")
        return []
     
    filenames = []
       if search_results and len(search_results) > 0 and len(search_results[0]) > 0:
        for hit in search_results[0]:
            # Access the underlying entity. Depending on your pymilvus version, you might need to use attributes.
            filenames.append(hit.filename)  # Assuming hit.filename holds the stored filename (without extension)
            print(f"🔍 Retrieved filename from Zilliz: {hit.filename}")
    else:
        print(f"No results found for query '{query_text}'.")
    return filenames


6. Flask endpoint: handling API requests

I now need to build a route that accepts a user query, that will query the vector search in Zilliz, that will fetch the according data in UltiHash and that will show the user the answer to their query. For that I used the Flask framework:

# ---------------------------
# Initialize Flask Application
# ---------------------------
app = Flask(__name__)

# ---------------------------
# Endpoint: /get_landscape_images
# ---------------------------
@app.route("/get_landscape_images", methods=["POST"])
def get_landscape_images():
    """
    Expects a JSON payload:
    {
      "query": "mountain"
    }
    Performs a vector search in Zilliz for the given query text, retrieves the top 3 matching records,
    fetches the corresponding images from UltiHash (bucket: landscapes) using the filename with '.jpg',
    and returns the results with base64-encoded images.
    """
    data = request.get_json()
    if not data or "query" not in data:
        return jsonify({"error": "Missing 'query' in request."}), 400
        
    query_text = data["query"].strip()
    results = []
    target_dir = Path("/Users/ultihash/test/retrieval-test")
    target_dir.mkdir(parents=True, exist_ok=True)

    # Get top 3 matching filenames from Zilliz.
    filenames = query_landscape(query_text, top_k=3)
    if not filenames:
        return jsonify({"error": "No matching records found."}), 404

    for filename in filenames:
        # Append .jpg to build the expected key in UltiHash.
        file_key = f"{filename}.jpg"
        try:
            response = s3.get_object(Bucket=bucket, Key=file_key)
            file_data = response["Body"].read()
        except Exception as e:
            results.append({"filename": filename, "error": f"Failed to fetch image '{file_key}': {str(e)}"})
          continue

        try:
            # Open the image using Pillow directly from memory.
            image = Image.open(BytesIO(file_data))
            image.show()  # This will open the image using your default image viewer.
        except Exception as e:
            results.append({"filename": filename, "error": f"Failed to open image: {str(e)}"})
            continue
    
        results.append({
            "filename": filename,
            "message": "Image fetched and opened successfully."
        })
        print(f"Processed filename {filename} and opened image {file_key}")
        
    return jsonify({"results": results})            

7. Running the API

All I need to add to run the API is this line of code:

if __name__ == "__main__":
    app.run(host="127.0.0.1", port=5000, debug=True)


Wrapping it up

I now have a multimodal retrieval pipeline where:

✅ My raw images are stored in UltiHash.

✅ The images embeddings and metadata are indexed in Zilliz.

✅ The API takes a text query, performs vector search, and retrieves matching images.

You can now test it with a simple POST request:

curl -X POST -H "Content-Type: application/json" -d '{"query": "sunset at the beach"}' http://127.0.0.1:5000/get_landscape_images