This setup has the best of both worlds: fast, intuitive queries in Neo4j and scalable, high-performance storage with UltiHash.
Introduction
Graph databases (GraphDBs) model relationships between data points or entities: they structure data as nodes (entities) and edges (relationships), making it easy to traverse complex networks. Their design is why they power everything from recommendation engines to fraud detection - anywhere connections between data points matter more than individual records. This is what differentiates them from traditional relational databases that model connections between data points through rigid tables and costly JOINs.
Take AI retrieval pipelines: whether you’re linking knowledge for LLMs, mapping user behavior for recommendations, or tracking interactions in cybersecurity, relationships drive the insights. But GraphDBs aren’t built to store raw data like images, videos, or documents. They handle the metadata and relationships, while the actual files are kept elsewhere. When a query needs more than structured data - like pulling up a product image or a research paper, the file itself comes from another storage layer, the object storage.
Neo4j is the graph database of choice, and together with UltiHash, they make a solid pair for working with connected data and unstructured files: Neo4j takes care of the relationships, mapping how everything connects, while UltiHash handles the heavy lifting of storing and delivering unstructured objects efficiently.
This setup has the best of both worlds: fast, intuitive queries in Neo4j and scalable, high-performance storage with UltiHash. Whether it’s fetching medical scans linked to patient records or surfacing related videos in a knowledge graph, this combination keeps retrieval smooth without storage slowing things down.
I was too curious not to try it out myself, and it absolutely delivered; everything from the graph queries to pulling up images just clicked. You can find the code and what I learned below.
Connecting Neo4j and UltiHash
To demonstrate how to connect a GraphDB with UltiHash, I decided to proceed with connections mapped in Neo4j and images stored in UltiHash, and built a FlaskAPI - a lightweight Python framework for handling HTTP requests - that queries Neo4j and retrieves associated images from UltiHash.
What you’ll need:
To get everything working, I first made sure both storage and graph layers were set up properly. Here's what that looked like on my end:
UltiHash for object storage
- Deployed a local UltiHash cluster using the test version on Docker (guide)
- Stored raw images in a bucket named
landscapes
(guide)
AuraDB (managed version of Neo4j) for the graph database
- Modelled the data as nodes and relationships in Neo4j
- Imported the dataset into my AuraDB instance
1. Setting Up Neo4j Connection
from neo4j import GraphDatabase
NEO4J_URI = "neo4j+s://your-link.neo4j.io"
NEO4J_USER = "user"
NEO4J_PASSWORD = "your_password_here"
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))
This establishes a connection to a Neo4j AuraDB instance, where the nodes, relationships and metadata are stored.
2. Connecting to UltiHash as Object Storage
import boto3
ULTIHASH_ENDPOINT = "endpoint_url"
ULTIHASH_BUCKET = "movies" #my bucket name
ULTIHASH_ACCESS_KEY = "YOUR-ACCESS KEY"
ULTIHASH_SECRET_KEY = "YOUR-SECRET-KEY"
s3 = boto3.client(
"s3",
endpoint_url=ULTIHASH_ENDPOINT,
aws_access_key_id=ULTIHASH_ACCESS_KEY,
aws_secret_access_key=ULTIHASH_SECRET_KEY
)
This sets up the S3-compatible connection to UltiHash where images (and other unstructured data e.g. videos or PDFs) are stored.
3. Querying Neo4j and Retrieving Images from UltiHash
To tie it all together, here’s the code of a lightweight FlaskAPI that handles the querying and image retrieval. It takes in a Cypher query, runs it against Neo4j, and if any poster_filename
fields are returned, it fetches and opens those images from UltiHash on the fly. The API returns the full query results as JSON, and handles both graph traversal and image fetching in one go, no manual actions needed.
from flask import Flask, request, jsonify
from io import BytesIO
from PIL import Image
app = Flask(__name__)
@app.route("/query", methods=["POST"])
def query():
data = request.get_json()
cypher_query = data.get("query")
if not cypher_query:
return jsonify({"error": "Missing query parameter"}), 400
with driver.session() as session:
results = session.run(cypher_query)
records = [record.data() for record in results]
if not records:
return jsonify({"error": "No results found"}), 404
output = []
poster_filenames = [record["poster_filename"] for record in records if "poster_filename" in record]
if poster_filenames:
fetch_and_open_images(poster_filenames)
return jsonify({"results": records})
def fetch_and_open_images(filenames):
for filename in filenames:
if not filename.lower().endswith(('.jpg', '.jpeg')): #my objects have file extensions
filename += ".jpg"
try:
response = s3.get_object(Bucket=ULTIHASH_BUCKET, Key=filename)
file_data = response["Body"].read()
image = Image.open(BytesIO(file_data))
image.show()
except Exception as e:
print(f"Error fetching/opening {filename}: {e}")
if __name__ == "__main__":
app.run(debug=True)
4. Querying from the API:
Here’s an example of a query that aims to answer “On what movie have Brad Pitt and Quentin Tarantino worked together? Retrieve the image posters for which movies it is true.”:
curl -X POST "http://127.0.0.1:5000/query" \
-H "Content-Type: application/json" \
-d '{
"query": "MATCH (a:Actors {actor: \"Brad Pitt\"})-[:`Acts in`]->(m:Movies)<-[:`Directed`]-(d:Directors {director: \"Quentin Tarantino\"}) RETURN m.movie_title AS movie_title, m.poster_filename AS poster_filename",
"fetch_image": true
}'
Neo4j is built around Cypher, a query language designed specifically for working with graph data. It is optimized for pattern-based queries, making it easy to express relationships and navigate connected data efficiently. This makes the user experience seamless, as you can interact with Neo4j just as you normally would using Cypher. The API is built to take in any Cypher query, execute it on the database, and return structured results. That means you don’t have to learn new syntax or change how you query, everything works the same. The only difference? When your query involves the associated files (like movie posters), they’re directly retrieved from UltiHash, so you don’t have to manage raw data retrieval separately (and manually!).
The Bigger Picture: AI Pipelines at Scale
By following the steps above, you can integrate Neo4j and UltiHash to build a setup where structured relationships are handled in a GraphDB while unstructured data is offloaded to scalable object storage. This architecture supports high retrieval performance, reduces storage costs thanks to UltiHash’s built-in deduplication, and ensures that AI-driven applications like recommendation systems, fraud detection, and knowledge graphs can operate reliably at scale.
Interested in testing this setup? Try UltiHash and start building your AI project today.