Store Large Files
On this page
Overview
In this guide, you can learn how to store and retrieve large files in MongoDB by using GridFS. GridFS is a specification implemented by the MongoDB PHP Library that describes how to split files into chunks when storing them and reassemble them when retrieving them. The library's implementation of GridFS is an abstraction that manages the operations and organization of the file storage.
Use GridFS if the size of your files exceeds the BSON document size limit of 16MB. For more detailed information on whether GridFS is suitable for your use case, see GridFS in the MongoDB Server manual.
How GridFS Works
GridFS organizes files in a bucket, a group of MongoDB collections that contain the chunks of files and information describing them. The bucket contains the following collections, named using the convention defined in the GridFS specification:
The
chunks
collection stores the binary file chunks.The
files
collection stores the file metadata.
When you create a new GridFS bucket, the library creates the preceding
collections, prefixed with the default bucket name fs
, unless
you specify a different name. The library also creates an index on each
collection to ensure efficient retrieval of the files and related
metadata. The library creates the GridFS bucket, if it doesn't exist, only when the first write
operation is performed. The library creates indexes only if they don't exist and when the
bucket is empty. For more information about
GridFS indexes, see GridFS Indexes
in the MongoDB Server manual.
When using GridFS to store files, the library splits the files into smaller
chunks, each represented by a separate document in the chunks
collection.
It also creates a document in the files
collection that contains
a file ID, file name, and other file metadata. You can upload the file by passing
a stream to the MongoDB PHP Library to consume or creating a new stream and writing to it
directly. To learn more about streams, see Streams
in the PHP manual.
View the following diagram to see how GridFS splits the files when uploaded to a bucket:
When retrieving files, GridFS fetches the metadata from the files
collection in the specified bucket and uses the information to reconstruct
the file from documents in the chunks
collection. You can read the file
by writing its contents to an existing stream or creating a new stream that points
to the file.
Create a GridFS Bucket
To store or retrieve files from GridFS, call the MongoDB\Database::selectGridFSBucket()
method on your database. This method accesses an existing bucket or creates
a new bucket if one does not exist.
The following example calls the selectGridFSBucket()
method on the db
database:
$bucket = $client->db->selectGridFSBucket();
Customize the Bucket
You can customize the GridFS bucket configuration by passing an array that specifies
option values to the selectGridFSBucket()
method. The following table describes
some options you can set in the array:
Option | Description |
---|---|
bucketName | Specifies the bucket name to use as a prefix for the files and chunks collections.
The default value is 'fs' .Type: string |
chunkSizeBytes | Specifies the chunk size that GridFS splits files into. The default value is 261120 .Type: integer |
readConcern | Specifies the read concern to use for bucket operations. The default value is the
database's read concern. Type: MongoDB\Driver\ReadConcern |
readPreference | Specifies the read preference to use for bucket operations. The default value is the
database's read preference. Type: MongoDB\Driver\ReadPreference |
writeConcern | Specifies the write concern to use for bucket operations. The default value is the
database's write concern. Type: MongoDB\Driver\WriteConcern |
The following example creates a bucket named 'myCustomBucket'
by passing an
array to selectGridFSBucket()
that sets the bucketName
option:
$custom_bucket = $client->db->selectGridFSBucket( ['bucketName' => 'myCustomBucket'] );
Upload Files
You can upload files to a GridFS bucket by using the following methods:
MongoDB\GridFS\Bucket::openUploadStream()
: Opens a new upload stream to which you can write file contentsMongoDB\GridFS\Bucket::uploadFromStream()
: Uploads the contents of an existing stream to a GridFS file
Write to an Upload Stream
Use the openUploadStream()
method to create an upload stream for a given
file name. The openUploadStream()
method allows you to specify configuration
information in an options array, which you can pass as a parameter.
This example uses an upload stream to perform the following actions:
Opens a writable stream for a new GridFS file named
'my_file'
Sets the
metadata
option in an array parameter to theopenUploadStream()
methodCalls the
fwrite()
method to write data to'my_file'
, which the stream points toCalls the
fclose()
method to close the stream pointing to'my_file'
$stream = $bucket->openUploadStream('my_file', [ 'metadata' => ['contentType' => 'text/plain'] ]); fwrite($stream, 'Data to store'); fclose($stream);
Upload an Existing Stream
Use the uploadFromStream()
method to upload the contents of a stream to
a new GridFS file. The uploadFromStream()
method allows you to specify configuration
information in an options array, which you can pass as a parameter.
This example performs the following actions:
Calls the
fopen()
method to open a file located at/path/to/input_file
as a stream in binary read (rb
) modeCalls the
uploadFromStream()
method to upload the contents of the stream to a GridFS file named'new_file'
$file = fopen('/path/to/input_file', 'rb'); $bucket->uploadFromStream('new_file', $file);
Retrieve File Information
In this section, you can learn how to retrieve file metadata stored in the
files
collection of the GridFS bucket. The metadata contains information
about the file it refers to, including:
The
_id
of the fileThe name of the file
The length/size of the file
The upload date and time
A
metadata
document in which you can store any other information
To retrieve files from a GridFS bucket, call the MongoDB\GridFS\Bucket::find()
method on the MongoDB\GridFS\Bucket
instance. The method returns a MongoDB\Driver\Cursor
instance from which you can access the results. To learn more about Cursor
objects in
the MongoDB PHP Library, see the Access Data From a Cursor guide.
Example
The following code example shows you how to retrieve and print file metadata
from files in a GridFS bucket. It uses a foreach
loop to iterate through
the returned cursor and display the contents of the files uploaded in the
Upload Files examples:
$files = $bucket->find(); foreach ($files as $file_doc) { echo toJSON($file_doc), PHP_EOL; }
{ "_id" : { "$oid" : "..." }, "chunkSize" : 261120, "filename" : "my_file", "length" : 13, "uploadDate" : { ... }, "metadata" : { "contentType" : "text/plain" }, "md5" : "6b24249b03ea3dd176c5a04f037a658c" } { "_id" : { "$oid" : "..." }, "chunkSize" : 261120, "filename" : "new_file", "length" : 13, "uploadDate" : { ... }, "md5" : "6b24249b03ea3dd176c5a04f037a658c" }
The find()
method accepts various query specifications. You can use its
$options
parameter to specify the sort order, maximum number of documents to return,
and the number of documents to skip before returning. To view a list of available
options, see the API documentation.
Note
The preceding example calls the toJSON()
method to print file metadata as
Extended JSON, defined in the following code:
function toJSON(object $document): string { return MongoDB\BSON\Document::fromPHP($document)->toRelaxedExtendedJSON(); }
Download Files
You can download files from a GridFS bucket by using the following methods:
MongoDB\GridFS\Bucket::openDownloadStreamByName()
orMongoDB\GridFS\Bucket::openDownloadStream()
: Opens a new download stream from which you can read the file contentsMongoDB\GridFS\Bucket::downloadToStream()
: Writes the entire file to an existing download stream
Read From a Download Stream
You can download files from your MongoDB database by using the
MongoDB\GridFS\Bucket::openDownloadStreamByName()
method to
create a download stream.
This example uses a download stream to perform the following actions:
Selects a GridFS file named
'my_file'
, uploaded in the Write to an Upload Stream example, and opens it as a readable streamCalls the
stream_get_contents()
method to read the contents of'my_file'
Prints the file contents
Calls the
fclose()
method to close the download stream pointing to'my_file'
$stream = $bucket->openDownloadStreamByName('my_file'); $contents = stream_get_contents($stream); echo $contents, PHP_EOL; fclose($stream);
"Data to store"
Note
If there are multiple documents with the same file name,
GridFS will stream the most recent file with the given name (as
determined by the uploadDate
field).
Alternatively, you can use the MongoDB\GridFS\Bucket::openDownloadStream()
method, which takes the _id
field of a file as a parameter:
$stream = $bucket->openDownloadStream(new ObjectId('66e0a5487c880f844c0a32b1')); $contents = stream_get_contents($stream); fclose($stream);
Note
The GridFS streaming API cannot load partial chunks. When a download stream needs to pull a chunk from MongoDB, it pulls the entire chunk into memory. The 255-kilobyte default chunk size is usually sufficient, but you can reduce the chunk size to reduce memory overhead or increase the chunk size when working with larger files. For more information about setting the chunk size, see the Customize the Bucket section of this page.
Download a File Revision
When your bucket contains multiple files that share the same file name, GridFS chooses the most recently uploaded version of the file by default. To differentiate between each file that shares the same name, GridFS assigns them a revision number, ordered by upload time.
The original file revision number is 0
and the next most recent file revision
number is 1
. You can also specify negative values that correspond to the recency
of the revision. The revision value -1
references the most recent revision and -2
references the next most recent revision.
You can instruct GridFS to download a specific file revision by passing an options
array to the openDownloadStreamByName()
method and specifying the revision
option. The following example reads the contents of the original file named
'my_file'
rather than the most recent revision:
$stream = $bucket->openDownloadStreamByName('my_file', ['revision' => 0]); $contents = stream_get_contents($stream); fclose($stream);
Download to an Existing Stream
You can download the contents of a GridFS file to an existing stream
by calling the MongoDB\GridFS\Bucket::downloadToStream()
method
on your bucket.
This example performs the following actions:
Calls the
fopen()
method to open a file located at/path/to/output_file
as a stream in binary write (wb
) modeDownloads a GridFS file that has an
_id
value ofObjectId('66e0a5487c880f844c0a32b1')
to the stream
$file = fopen('/path/to/output_file', 'wb'); $bucket->downloadToStream( new ObjectId('66e0a5487c880f844c0a32b1'), $file, );
Rename Files
Use the MongoDB\GridFS\Bucket::rename()
method to update the name of
a GridFS file in your bucket. You must specify the file to rename by its
_id
field rather than its file name.
The following example shows how to update the filename
field to
'new_file_name'
by referencing a document's _id
field:
$bucket->rename(new ObjectId('66e0a5487c880f844c0a32b1'), 'new_file_name');
Note
File Revisions
The rename()
method supports updating the name of only one file at
a time. If you want to rename each file revision, or files with different upload
times that share the same file name, collect the _id
values of each revision.
Then, pass each value in separate calls to the rename()
method.
Delete Files
Use the MongoDB\GridFS\Bucket::delete()
method to remove a file's collection
document and associated chunks from your bucket. This effectively deletes the file.
You must specify the file by its _id
field rather than its file name.
The following example shows you how to delete a file by referencing its _id
field:
$bucket->delete(new ObjectId('66e0a5487c880f844c0a32b1'));
Note
File Revisions
The delete()
method supports deleting only one file at a time. If
you want to delete each file revision, or files with different upload
times that share the same file name, collect the _id
values of each revision.
Then, pass each value in separate calls to the delete()
method.
API Documentation
To learn more about using the MongoDB PHP Library to store and retrieve large files, see the following API documentation: