'Laravel 9.x and MySQL how to issue an Update on millions of records
I have a project where certain records in a table will "expire" after so many days. I use soft deletes so basically all I have to do is set the deleted_at column. Currently the generated SQL is as below:
update st_photos set deleted_at = '2022-05-03 05:00:01', updated_at = '2022-05-03 05:00:01'
where user_id in (select id from users where account_code = 'S') and created_at < '2022-03-04 05:00:01' and in_storage = 0 and deleted_at is null
Right now there are only about 16 million records and the update statement is taking about 65 seconds to finish. What happens is if other code tries to insert a record into that table when the above update first starts then the insert fails with a Lock Wait Timeout.
Is there a way to somehow break the update statement into smaller batches to give other sql commands time to run? My biggest concern is NOT how long the process of the update takes, I just don't want any given batch update to run for longer than about 30 seconds so it does not lock other sql commands out for too long.
One way I could do it is group the users in the subselect so that it processes the photos in batches, but I was wondering if anyone else has a better, easier to implement method of doing it.
if it helps below is the laravel code snippets that generate the above SQL.
self::getExpiredMediaQuery(AppConstants::ACCOUNT_CODE_STANDARD)->update(['deleted_at' => $now, 'updated_at' => $now]);
...
public static function getExpiredMediaQuery($accountCode) {
$dateExpired = Carbon::now()->subDays(AppConstants::$accountProperties[$accountCode]['mediaRetentionDays']);
$query = Photo::whereIn('user_id', function ($query) use ($accountCode) {
$query->select('id')->from('users')
->where('account_code', $accountCode);
})
->where('created_at', '<', $dateExpired)
->where('in_storage', false);
return $query;
}
Solution 1:[1]
Use some chunking method like chunkById()
. Something like this should work to break up the job into manageable pieces.
self::getExpiredMediaQuery(AppConstants::ACCOUNT_CODE_STANDARD)
->chunkById(
1000,
fn(Photo $p) => $p->update(['deleted_at' => $now, 'updated_at' => $now])
);
I'd be interested in seeing an up-to-date reference for the suggestion in comments that WHERE … IN
is slow, but as mentioned the query in the method could be generated with a join, though using query builder is preferable to dumping a raw query into a call to DB
:
Photo::join("users", "photos.user_id", "=", "users.id")
->where("photos.in_storage", false)
->where("photos.created_at", "<", $date_expired)
->where("users.account_code", $accountCode)
// the joined values are not Eloquent models
// if there are soft deletes they have to be filtered manually
->whereNull("users.deleted_at");
The same chunking approach as above could then be used.
Solution 2:[2]
You can use Laravel Job Batching for this kind of tasks which will be more easy way to handle in future if you had more records & more complex operation
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Saravana Sai |