'Attach one Active Storage blob to multiple files

In Rails, when attaching a file to an object, is there a way to check if it is already attached to another object? And if so, can I then use this existing attachment on a second file? The reasoning here is to prevent multiple uploads of the same files to the storage backend.

Specifically, this is stemming from files being attached in the development environment using file_fixture_upload.



Solution 1:[1]

You can attach an existing blob to an object, just like the example: person.avatar.attach(avatar_blob) in the #attach rails doc: https://api.rubyonrails.org/classes/ActiveStorage/Attached/One.html#method-i-attach .

But the key problem is "how to know the file is already uploaded and where is the corresponding blob?"

There are 2 possible ways to think about:

  1. Try to find the blob by using the data stored in active_storage_blobs table.
  2. Create another model to trace and maintain the file upload status.

Sol 1: Try to find the blob by using the data stored in active_storage_blobs table.

The active_storage_blobs scheme is:

  create_table "active_storage_blobs", force: :cascade do |t|
    t.string "key", null: false
    t.string "filename", null: false
    t.string "content_type"
    t.text "metadata"
    t.bigint "byte_size", null: false
    t.string "checksum"
    t.datetime "created_at", precision: nil, null: false
    t.string "service_name"
    t.index ["key"], name: "index_active_storage_blobs_on_key", unique: true
  end

It is possible to find the blob by using the filename or checksum, but you need to implement the query on your own. There are some issues need to notice:

  1. filename and checksum are not indexed by default and it may have efficiency issue when blob increases. You may consider indexing them if needed.
  2. Need to know how the filename and checksum are generated in active storage and apply it on file when doing query.

Sol 2: Create another model to trace and maintain the file upload status

The first solution is a little hack because the active_storage_blobs is supposed to be internal and we should not access them in rails app directly.

A better way is creating a model to trace and maintain the uploaded files and decide what information to store for querying a file. For example, we can create a UploadedFile model with a checksum field:

db schema

  create_table "uploaded_files", force: :cascade do |t|
    t.string "checksum", null: false
    t.index ["checksum"], name: "index_uploaded_files_on_checksum", unique: true
  end

And implement related methods to handle the upload as below:

app/models/uploaded_file.rb

class UploadedFile < ApplicationRecord
  has_one_attached(:uploaded_file)

  class << self
    def find_or_create_file_blob(file)
      blob = find_blob_by_given_file(file)
      if blob.blank?
        blob = create_uploaded_file_and_do_upload(file)
      end
      blob
    end

    private

    def find_blob_by_given_file(file)
      checksum = calculate_checksum(file)
      existing_uf = UploadedFile.where(checksum:).take
      existing_uf.present? ? existing_uf.uploaded_file.blob : nil
    end

    def create_uploaded_file_and_do_upload(file)
      checksum = calculate_checksum(file)
      uf = UploadedFile.create(checksum:)
      uf.uploaded_file.attach(file)
      if uf.uploaded_file.attched?
        return uf.uploaded_file.blob
      else
        # do error handling when upload failed
      end
    end

    def calculate_checksum(file)
      # implement how to calculate checksum with a given file
    end
  end
  # ...
end

Finally, when upload a file to persion.avatar, instead of updating persion.avatar directly, use UploadedFile.find_or_create_file_blob to find the blob first then attach it to persion.avatar.

blob = UploadedFile.find_or_create_file_blob(params[:file])
persion.avatar.attach(blob)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1