Why does vector search need object storage as its foundation?
Learn how to build a high-performance retrieval pipeline with UltiHash and Zilliz for fast multimodal search across text + images.
Vector search is changing the way we find information, particularly for unstructured data, by complementing traditional SQL-based queries when relying on similarity search rather than exact matches. Instead of just matching keywords, it uses embeddings—numerical representations that capture the meaning and context of a sentence. This allows you to search based on similarity, so you get results that are relevant even if the words don’t exactly match.
Think about image search: instead of looking for a specific file name, you can simply type “snowy mountain at sunset” and instantly see the most relevant results. That's because vector search organizes data in a way that truly understands what you're looking for, not just matching text.
But here’s the catch - vector databases don’t actually store the raw data. They index those embeddings, while the actual images, videos, or documents reside in object storage. That storage layer must be highly scalable and fast; otherwise, search performance suffers, costs rise, and scaling becomes a real headache.
Object storage is the best foundation for vector search because:
✅ Scales effortlessly - Handles billions of objects without a drop in performance. ✅ Separates storage & compute - Keeps search fast and costs under control. ✅ Built for retrieval - AI-powered search, recommendations, and discovery demand instant data access.
If vector search is the engine, object storage is the fuel. Without the right storage layer, retrieval slows down, costs spiral, and scaling becomes a problem.
Building a retrieval pipeline relies on the choice of storage and search tools matters. I picked UltiHash and Zilliz because they each solve a specific part of the problem.
UltiHash is a high-performance object storage, making it efficient to store and retrieve large datasets like images.
Zilliz is built for vector search - it’s also the managed version of Milvus (open-source). Since I'm using CLIP embeddings, I needed a database that could handle high-dimensional multi-modal search effectively.
With UltiHash storing the raw images and Zilliz handling the embeddings, this setup ensures efficient storage while enabling fast similarity search. Okay, theory aside, let’s test it… I documented the steps from my little experiment setting up vector search on an image dataset, using Zilliz as the vector DB and UltiHash as object storage.
When setting up my vector search I started with the foundation. At first, I need a way to store and retrieve raw data efficiently, to enable fast, semantic search. The raw data - images, videos, or documents - isn’t stored inside the vector database itself. Instead, embeddings are indexed separately, while the actual files live in object storage. UltiHash works well here - it is scalable, I can deploy it with Docker or Kubernetes anywhere and it offers high throughput performance especially on read.
Before you begin, create a folder at /Users/ultihash/test (for storing images and metadata) or update the code with your own folder path.
A. UltiHash setup
First, I deployed an UltiHash cluster to store the raw images. I did this on my local machine, using the test version of UltiHash on Docker.
✅ Store raw images in my UltiHash bucket ‘landscapes’ - Guide
B.1 Do you have your images’ embeddings ?
I used CLIP to generate embeddings for images, allowing them to be matched against text queries later.This step enables semantic search, where results are based on meaning rather than exact terms.
I prepared this script in case you need to extract embeddings from your data:
Load images from /Users/ultihash/test/landscapes_test
Generate embeddings with CLIP
Save the metadata (filenames + embeddings) in a JSON file /Users/ultihash/test/landscapes_metadata.json
#!/usr/bin/env python3import os
import json
import torch
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
from pathlib import Path
from tqdm import tqdm
defmain():# Define the input directory containing your landscape images. input_dir = Path("/Users/ultihash/test/landscapes_test")
ifnot input_dir.exists():
print(f"Input directory {input_dir} does not exist.")
return# List only image files with supported extensions. supported_extensions = {".jpg", ".jpeg", ".png", ".bmp"}
image_files = [p for p in input_dir.glob("*") if p.suffix.lower() in supported_extensions]
ifnot image_files:
print("No image files found in the directory.")
returnprint(f"Found {len(image_files)} images in {input_dir}.")
# Load the CLIP model and processor.# We use the 'openai/clip-vit-base-patch32' model for generating image embeddings. model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
model.eval() # Set model to evaluation mode
metadata = []
# Process each image (using a progress bar for feedback)for image_path in tqdm(image_files, desc="Processing images"):
try:
# Open the image and ensure it's in RGB mode. image = Image.open(image_path).convert("RGB")
except Exception as e:
print(f"Error opening image {image_path}: {e}")
continue# Prepare the image for the model. inputs = processor(images=image, return_tensors="pt", padding=True)
with torch.no_grad():
# Generate the image embedding. image_features = model.get_image_features(**inputs)
# Convert the tensor embedding to a list of floats. embedding = image_features[0].tolist()
# Create a metadata entry containing the filename and the embedding. entry = {
"filename": image_path.name,
"embedding": embedding
}
metadata.append(entry)
# Define the output JSON file path. output_file = Path("/Users/ultihash/test/landscapes_metadata.json")
# Save the metadata to a JSON file with indentation for readability.withopen(output_file, "w") as f:
json.dump(metadata, f, indent=2)
print(f"Metadata for {len(metadata)} images saved to {output_file}.")
if __name__ == "__main__":
main()
B.2 Vector Database setup (Zilliz)
With raw images in UltiHash, the next step was indexing embeddings in a vector database. I used Zilliz Free Plan (a managed version of Milvus) to store and search embeddings efficiently.
✅ Import my metadata collection (including embeddings)
Python Script: Get your metadata in the format requested by Zilliz
📌 What this script does:
Reads the original metadata file from /users/ultihash/test/landscapes_metadata.json
Removes the .jpg extension from filenames
Structures the data into Zilliz's required format
Saves the transformed metadata as /users/ultihash/test/landscape_metadata_col.json, ready for insertion into Zilliz
#!/usr/bin/env python3import os
import json
from pathlib import Path
defmain():# Define input and output file paths. input_file = Path("/users/ultihash/test/landscapes_metadata.json")
output_file = Path("/users/ultihash/test/landscape_metadata_col.json")
# Read the original metadata file.withopen(input_file, "r") as f:
metadata_list = json.load(f)
# Transform the metadata entries. transformed_data = []
for entry in metadata_list:
# Remove the .jpg extension from the filename. filename = entry.get("filename", "")
base_filename, _ = os.path.splitext(filename)
transformed_entry = {
"filename": base_filename,
"embedding": entry.get("embedding", [])
}
transformed_data.append(transformed_entry)
# Create the final structure for Zilliz insertion. zilliz_format = {
"collectionName": "landscapes",
"data": transformed_data
}
# Write the transformed metadata to the new JSON file.withopen(output_file, "w") as f:
json.dump(zilliz_format, f, indent=2)
print(f"Transformed metadata saved to {output_file}")
if __name__ == "__main__":
main()
C. Building a multimodal (text-image) retrieval pipeline
Now that I have my raw data stored on UltiHash and Zilliz handling the metadata and embeddings, I tested their integration. I used a flask API framework to orchestrate the queries from the user, to the vector database and to UltiHash. FlaskAPI is a python script that requires low effort and allows me to test that my retrieval pipeline works as intended.
1️⃣ Accepts a text query (e.g., "sunset at the beach")
2️⃣ Generates an ****embedding from the user’s text query using CLIP
3️⃣ Performs a vector search and identifies the top 3 similar embeddings that are stored in Zilliz (matching the query embedding with stored image embeddings)
4️⃣ Retrieves the 3 most similar matching images from UltiHash
Let’s break it down step by step.
1. Setting up the retrieval pipeline
📦 Imports & Dependencies
First, I imported the following libraries and dependencies. I included a short description of what each does.
#!/usr/bin/env python3import os # Handles file paths and environment variablesimport sys # Accesses system functions and command-line argumentsimport json # Reads/writes JSON dataimport base64 # Encodes/decodes binary data as textimport torch # Loads CLIP for text-to-embedding conversionimport argparse # Parses command-line argumentsimport boto3 # Retrieves images from UltiHashfrom pathlib import Path # Works with file system pathsfrom flask import Flask, request, jsonify # Framework to handle API requestsfrom pymilvus import connections, Collection # Connects to Zilliz from transformers import CLIPProcessor, CLIPModel # Loads CLIP for text-to-embedding conversionfrom io import BytesIO # Handles image processingfrom PIL import Image # Handles image processing
2. Connecting to Zilliz (vector database)
After importing all the libraries required, I start by establishing a connection to Zilliz and to specify our collection name.
For that step, I’ll input my Zilliz URI and my Zilliz token (which are both found in My Cluster Details):
# ---------------------------# Connect to Zilliz (Milvus)# ---------------------------connections.connect(
alias="default",
uri="https://<your-zilliz-uri>", # Replace with your Zilliz cluster URI token="<your-zilliz-token>"# Replace with your Zilliz access token)
print("✅ Connected to Zilliz!")
collection_name = "landscapes"collection = Collection(collection_name) # Access the correct collection
3. Connecting to UltiHash (object storage for raw data)
Then, I setup the connection to UltiHash. It is S3 API compatible, so after importing boto3 (step 1), I just need to setup my boto3 S3 Client. For that, I need my AWS_ACCESS_KEY_ID , AWS_SECRET_KEY and my UH_LICENSE_STRING (which I find in my UltiHash Dashboard). In this step I also need to define my bucket name.
# ---------------------------# Set up boto3 S3 Client for UltiHash# ---------------------------s3 = boto3.client(
's3',
endpoint_url="http://127.0.0.1:8080", # Adjust if necessary aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY")
)
# Ensure API calls include the UltiHash licensedefadd_license_header(request, **kwargs): request.headers["UH_LICENSE_STRING"] = "<your-ultihash-license>"# Replace with your license keys3.meta.events.register("before-sign.s3", add_license_header)
bucket = "landscapes"# Set the correct UltiHash bucket
4. Loading CLIP for querying
To run a vector search, I first need to generate embeddings from my text queries. Since my queries are text and the data I’m retrieving are images, I need a multimodal embedding model. For this, I chose CLIP: it will generate an embeddings from my text query which will be compared to my stored image embeddings in Zilliz.
To extract the embedding form my text query and enable a comparison with the embeddings stored in Zilliz to find the top 3 most similar vectors. I define the following function:
# ---------------------------# Helper function: Query Zilliz with vector search# ---------------------------defquery_landscape(query_text: str, top_k: int = 3):"""
Computes the CLIP text embedding for the given query and performs a vector search in Zilliz.
Parameters:
query_text (str): The text query (e.g. "mountain").
top_k (int): The number of top results to retrieve (default 3).
Returns:
list: A list of filenames (without extension) from the best matching records.
"""# Compute the CLIP text embedding for the query text. inputs = clip_processor(text=[query_text], return_tensors="pt", padding=True)
with torch.no_grad():
query_vector = clip_model.get_text_features(**inputs)[0].tolist()
# Define search parameters for cosine similarity. search_params = {"metric_type": "COSINE", "params": {"nprobe": 10}}
try:
search_results = collection.search(
data=[query_vector],
anns_field="embedding",
param=search_params,
limit=top_k,
# No additional filter here since we're doing a pure vector search. output_fields=["filename"]
)
except Exception as e:
print(f"Vector search error for query '{query_text}': {e}")
return []
filenames = []
if search_results andlen(search_results) > 0andlen(search_results[0]) > 0:
for hit in search_results[0]:
# Access the underlying entity. Depending on your pymilvus version, you might need to use attributes. filenames.append(hit.filename) # Assuming hit.filename holds the stored filename (without extension)print(f"🔍 Retrieved filename from Zilliz: {hit.filename}")
else:
print(f"No results found for query '{query_text}'.")
return filenames
6. Flask endpoint: handling API requests
I now need to build a route that accepts a user query, that will query the vector search in Zilliz, that will fetch the according data in UltiHash and that will show the user the answer to their query. For that I used the Flask framework:
# ---------------------------# Initialize Flask Application# ---------------------------app = Flask(__name__)
# ---------------------------# Endpoint: /get_landscape_images# ---------------------------@app.route("/get_landscape_images", methods=["POST"])defget_landscape_images():"""
Expects a JSON payload:
{
"query": "mountain"
}
Performs a vector search in Zilliz for the given query text, retrieves the top 3 matching records,
fetches the corresponding images from UltiHash (bucket: landscapes) using the filename with '.jpg',
and returns the results with base64-encoded images.
""" data = request.get_json()
ifnot data or"query"notin data:
return jsonify({"error": "Missing 'query' in request."}), 400 query_text = data["query"].strip()
results = []
target_dir = Path("/Users/ultihash/test/retrieval-test")
target_dir.mkdir(parents=True, exist_ok=True)
# Get top 3 matching filenames from Zilliz. filenames = query_landscape(query_text, top_k=3)
ifnot filenames:
return jsonify({"error": "No matching records found."}), 404for filename in filenames:
# Append .jpg to build the expected key in UltiHash. file_key = f"{filename}.jpg"try:
response = s3.get_object(Bucket=bucket, Key=file_key)
file_data = response["Body"].read()
except Exception as e:
results.append({"filename": filename, "error": f"Failed to fetch image '{file_key}': {str(e)}"})
continuetry:
# Open the image using Pillow directly from memory. image = Image.open(BytesIO(file_data))
image.show() # This will open the image using your default image viewer.except Exception as e:
results.append({"filename": filename, "error": f"Failed to open image: {str(e)}"})
continue
results.append({
"filename": filename,
"message": "Image fetched and opened successfully." })
print(f"Processed filename {filename} and opened image {file_key}")
return jsonify({"results": results})
7. Running the API
All I need to add to run the API is this line of code:
if __name__ == "__main__":
app.run(host="127.0.0.1", port=5000, debug=True)
Wrapping it up
I now have a multimodal retrieval pipeline where:
✅ My raw images are stored in UltiHash.
✅ The images embeddings and metadata are indexed in Zilliz.
✅ The API takes a text query, performs vector search, and retrieves matching images.
You can now test it with a simple POST request:
curl -X POST -H "Content-Type: application/json" -d '{"query": "sunset at the beach"}' http://127.0.0.1:5000/get_landscape_images
Share this post:
Check this out:
Why does vector search need object storage as its foundation?
Learn how to build a high-performance retrieval pipeline with UltiHash and Zilliz for fast multimodal search across text + images.
Why does vector search need object storage as its foundation?
Juliette Lehmann
Learn how to build a high-performance retrieval pipeline with UltiHash and Zilliz for fast multimodal search across text + images.
Vector search is changing the way we find information, particularly for unstructured data, by complementing traditional SQL-based queries when relying on similarity search rather than exact matches. Instead of just matching keywords, it uses embeddings—numerical representations that capture the meaning and context of a sentence. This allows you to search based on similarity, so you get results that are relevant even if the words don’t exactly match.
Think about image search: instead of looking for a specific file name, you can simply type “snowy mountain at sunset” and instantly see the most relevant results. That's because vector search organizes data in a way that truly understands what you're looking for, not just matching text.
But here’s the catch - vector databases don’t actually store the raw data. They index those embeddings, while the actual images, videos, or documents reside in object storage. That storage layer must be highly scalable and fast; otherwise, search performance suffers, costs rise, and scaling becomes a real headache.
Object storage is the best foundation for vector search because:
✅ Scales effortlessly - Handles billions of objects without a drop in performance. ✅ Separates storage & compute - Keeps search fast and costs under control. ✅ Built for retrieval - AI-powered search, recommendations, and discovery demand instant data access.
If vector search is the engine, object storage is the fuel. Without the right storage layer, retrieval slows down, costs spiral, and scaling becomes a problem.
Building a retrieval pipeline relies on the choice of storage and search tools matters. I picked UltiHash and Zilliz because they each solve a specific part of the problem.
UltiHash is a high-performance object storage, making it efficient to store and retrieve large datasets like images.
Zilliz is built for vector search - it’s also the managed version of Milvus (open-source). Since I'm using CLIP embeddings, I needed a database that could handle high-dimensional multi-modal search effectively.
With UltiHash storing the raw images and Zilliz handling the embeddings, this setup ensures efficient storage while enabling fast similarity search. Okay, theory aside, let’s test it… I documented the steps from my little experiment setting up vector search on an image dataset, using Zilliz as the vector DB and UltiHash as object storage.
When setting up my vector search I started with the foundation. At first, I need a way to store and retrieve raw data efficiently, to enable fast, semantic search. The raw data - images, videos, or documents - isn’t stored inside the vector database itself. Instead, embeddings are indexed separately, while the actual files live in object storage. UltiHash works well here - it is scalable, I can deploy it with Docker or Kubernetes anywhere and it offers high throughput performance especially on read.
Before you begin, create a folder at /Users/ultihash/test (for storing images and metadata) or update the code with your own folder path.
A. UltiHash setup
First, I deployed an UltiHash cluster to store the raw images. I did this on my local machine, using the test version of UltiHash on Docker.
✅ Store raw images in my UltiHash bucket ‘landscapes’ - Guide
B.1 Do you have your images’ embeddings ?
I used CLIP to generate embeddings for images, allowing them to be matched against text queries later.This step enables semantic search, where results are based on meaning rather than exact terms.
I prepared this script in case you need to extract embeddings from your data:
Load images from /Users/ultihash/test/landscapes_test
Generate embeddings with CLIP
Save the metadata (filenames + embeddings) in a JSON file /Users/ultihash/test/landscapes_metadata.json
#!/usr/bin/env python3import os
import json
import torch
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
from pathlib import Path
from tqdm import tqdm
defmain():# Define the input directory containing your landscape images. input_dir = Path("/Users/ultihash/test/landscapes_test")
ifnot input_dir.exists():
print(f"Input directory {input_dir} does not exist.")
return# List only image files with supported extensions. supported_extensions = {".jpg", ".jpeg", ".png", ".bmp"}
image_files = [p for p in input_dir.glob("*") if p.suffix.lower() in supported_extensions]
ifnot image_files:
print("No image files found in the directory.")
returnprint(f"Found {len(image_files)} images in {input_dir}.")
# Load the CLIP model and processor.# We use the 'openai/clip-vit-base-patch32' model for generating image embeddings. model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
model.eval() # Set model to evaluation mode
metadata = []
# Process each image (using a progress bar for feedback)for image_path in tqdm(image_files, desc="Processing images"):
try:
# Open the image and ensure it's in RGB mode. image = Image.open(image_path).convert("RGB")
except Exception as e:
print(f"Error opening image {image_path}: {e}")
continue# Prepare the image for the model. inputs = processor(images=image, return_tensors="pt", padding=True)
with torch.no_grad():
# Generate the image embedding. image_features = model.get_image_features(**inputs)
# Convert the tensor embedding to a list of floats. embedding = image_features[0].tolist()
# Create a metadata entry containing the filename and the embedding. entry = {
"filename": image_path.name,
"embedding": embedding
}
metadata.append(entry)
# Define the output JSON file path. output_file = Path("/Users/ultihash/test/landscapes_metadata.json")
# Save the metadata to a JSON file with indentation for readability.withopen(output_file, "w") as f:
json.dump(metadata, f, indent=2)
print(f"Metadata for {len(metadata)} images saved to {output_file}.")
if __name__ == "__main__":
main()
B.2 Vector Database setup (Zilliz)
With raw images in UltiHash, the next step was indexing embeddings in a vector database. I used Zilliz Free Plan (a managed version of Milvus) to store and search embeddings efficiently.
✅ Import my metadata collection (including embeddings)
Python Script: Get your metadata in the format requested by Zilliz
📌 What this script does:
Reads the original metadata file from /users/ultihash/test/landscapes_metadata.json
Removes the .jpg extension from filenames
Structures the data into Zilliz's required format
Saves the transformed metadata as /users/ultihash/test/landscape_metadata_col.json, ready for insertion into Zilliz
#!/usr/bin/env python3import os
import json
from pathlib import Path
defmain():# Define input and output file paths. input_file = Path("/users/ultihash/test/landscapes_metadata.json")
output_file = Path("/users/ultihash/test/landscape_metadata_col.json")
# Read the original metadata file.withopen(input_file, "r") as f:
metadata_list = json.load(f)
# Transform the metadata entries. transformed_data = []
for entry in metadata_list:
# Remove the .jpg extension from the filename. filename = entry.get("filename", "")
base_filename, _ = os.path.splitext(filename)
transformed_entry = {
"filename": base_filename,
"embedding": entry.get("embedding", [])
}
transformed_data.append(transformed_entry)
# Create the final structure for Zilliz insertion. zilliz_format = {
"collectionName": "landscapes",
"data": transformed_data
}
# Write the transformed metadata to the new JSON file.withopen(output_file, "w") as f:
json.dump(zilliz_format, f, indent=2)
print(f"Transformed metadata saved to {output_file}")
if __name__ == "__main__":
main()
C. Building a multimodal (text-image) retrieval pipeline
Now that I have my raw data stored on UltiHash and Zilliz handling the metadata and embeddings, I tested their integration. I used a flask API framework to orchestrate the queries from the user, to the vector database and to UltiHash. FlaskAPI is a python script that requires low effort and allows me to test that my retrieval pipeline works as intended.
1️⃣ Accepts a text query (e.g., "sunset at the beach")
2️⃣ Generates an ****embedding from the user’s text query using CLIP
3️⃣ Performs a vector search and identifies the top 3 similar embeddings that are stored in Zilliz (matching the query embedding with stored image embeddings)
4️⃣ Retrieves the 3 most similar matching images from UltiHash
Let’s break it down step by step.
1. Setting up the retrieval pipeline
📦 Imports & Dependencies
First, I imported the following libraries and dependencies. I included a short description of what each does.
#!/usr/bin/env python3import os # Handles file paths and environment variablesimport sys # Accesses system functions and command-line argumentsimport json # Reads/writes JSON dataimport base64 # Encodes/decodes binary data as textimport torch # Loads CLIP for text-to-embedding conversionimport argparse # Parses command-line argumentsimport boto3 # Retrieves images from UltiHashfrom pathlib import Path # Works with file system pathsfrom flask import Flask, request, jsonify # Framework to handle API requestsfrom pymilvus import connections, Collection # Connects to Zilliz from transformers import CLIPProcessor, CLIPModel # Loads CLIP for text-to-embedding conversionfrom io import BytesIO # Handles image processingfrom PIL import Image # Handles image processing
2. Connecting to Zilliz (vector database)
After importing all the libraries required, I start by establishing a connection to Zilliz and to specify our collection name.
For that step, I’ll input my Zilliz URI and my Zilliz token (which are both found in My Cluster Details):
# ---------------------------# Connect to Zilliz (Milvus)# ---------------------------connections.connect(
alias="default",
uri="https://<your-zilliz-uri>", # Replace with your Zilliz cluster URI token="<your-zilliz-token>"# Replace with your Zilliz access token)
print("✅ Connected to Zilliz!")
collection_name = "landscapes"collection = Collection(collection_name) # Access the correct collection
3. Connecting to UltiHash (object storage for raw data)
Then, I setup the connection to UltiHash. It is S3 API compatible, so after importing boto3 (step 1), I just need to setup my boto3 S3 Client. For that, I need my AWS_ACCESS_KEY_ID , AWS_SECRET_KEY and my UH_LICENSE_STRING (which I find in my UltiHash Dashboard). In this step I also need to define my bucket name.
# ---------------------------# Set up boto3 S3 Client for UltiHash# ---------------------------s3 = boto3.client(
's3',
endpoint_url="http://127.0.0.1:8080", # Adjust if necessary aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY")
)
# Ensure API calls include the UltiHash licensedefadd_license_header(request, **kwargs): request.headers["UH_LICENSE_STRING"] = "<your-ultihash-license>"# Replace with your license keys3.meta.events.register("before-sign.s3", add_license_header)
bucket = "landscapes"# Set the correct UltiHash bucket
4. Loading CLIP for querying
To run a vector search, I first need to generate embeddings from my text queries. Since my queries are text and the data I’m retrieving are images, I need a multimodal embedding model. For this, I chose CLIP: it will generate an embeddings from my text query which will be compared to my stored image embeddings in Zilliz.
To extract the embedding form my text query and enable a comparison with the embeddings stored in Zilliz to find the top 3 most similar vectors. I define the following function:
# ---------------------------# Helper function: Query Zilliz with vector search# ---------------------------defquery_landscape(query_text: str, top_k: int = 3):"""
Computes the CLIP text embedding for the given query and performs a vector search in Zilliz.
Parameters:
query_text (str): The text query (e.g. "mountain").
top_k (int): The number of top results to retrieve (default 3).
Returns:
list: A list of filenames (without extension) from the best matching records.
"""# Compute the CLIP text embedding for the query text. inputs = clip_processor(text=[query_text], return_tensors="pt", padding=True)
with torch.no_grad():
query_vector = clip_model.get_text_features(**inputs)[0].tolist()
# Define search parameters for cosine similarity. search_params = {"metric_type": "COSINE", "params": {"nprobe": 10}}
try:
search_results = collection.search(
data=[query_vector],
anns_field="embedding",
param=search_params,
limit=top_k,
# No additional filter here since we're doing a pure vector search. output_fields=["filename"]
)
except Exception as e:
print(f"Vector search error for query '{query_text}': {e}")
return []
filenames = []
if search_results andlen(search_results) > 0andlen(search_results[0]) > 0:
for hit in search_results[0]:
# Access the underlying entity. Depending on your pymilvus version, you might need to use attributes. filenames.append(hit.filename) # Assuming hit.filename holds the stored filename (without extension)print(f"🔍 Retrieved filename from Zilliz: {hit.filename}")
else:
print(f"No results found for query '{query_text}'.")
return filenames
6. Flask endpoint: handling API requests
I now need to build a route that accepts a user query, that will query the vector search in Zilliz, that will fetch the according data in UltiHash and that will show the user the answer to their query. For that I used the Flask framework:
# ---------------------------# Initialize Flask Application# ---------------------------app = Flask(__name__)
# ---------------------------# Endpoint: /get_landscape_images# ---------------------------@app.route("/get_landscape_images", methods=["POST"])defget_landscape_images():"""
Expects a JSON payload:
{
"query": "mountain"
}
Performs a vector search in Zilliz for the given query text, retrieves the top 3 matching records,
fetches the corresponding images from UltiHash (bucket: landscapes) using the filename with '.jpg',
and returns the results with base64-encoded images.
""" data = request.get_json()
ifnot data or"query"notin data:
return jsonify({"error": "Missing 'query' in request."}), 400 query_text = data["query"].strip()
results = []
target_dir = Path("/Users/ultihash/test/retrieval-test")
target_dir.mkdir(parents=True, exist_ok=True)
# Get top 3 matching filenames from Zilliz. filenames = query_landscape(query_text, top_k=3)
ifnot filenames:
return jsonify({"error": "No matching records found."}), 404for filename in filenames:
# Append .jpg to build the expected key in UltiHash. file_key = f"{filename}.jpg"try:
response = s3.get_object(Bucket=bucket, Key=file_key)
file_data = response["Body"].read()
except Exception as e:
results.append({"filename": filename, "error": f"Failed to fetch image '{file_key}': {str(e)}"})
continuetry:
# Open the image using Pillow directly from memory. image = Image.open(BytesIO(file_data))
image.show() # This will open the image using your default image viewer.except Exception as e:
results.append({"filename": filename, "error": f"Failed to open image: {str(e)}"})
continue
results.append({
"filename": filename,
"message": "Image fetched and opened successfully." })
print(f"Processed filename {filename} and opened image {file_key}")
return jsonify({"results": results})
7. Running the API
All I need to add to run the API is this line of code:
if __name__ == "__main__":
app.run(host="127.0.0.1", port=5000, debug=True)
Wrapping it up
I now have a multimodal retrieval pipeline where:
✅ My raw images are stored in UltiHash.
✅ The images embeddings and metadata are indexed in Zilliz.
✅ The API takes a text query, performs vector search, and retrieves matching images.
You can now test it with a simple POST request:
curl -X POST -H "Content-Type: application/json" -d '{"query": "sunset at the beach"}' http://127.0.0.1:5000/get_landscape_images