'Find and replace duplicate rows with the newest data in a defined range using Google Apps Script

Google sheet using columns A:M. Each day a new set of raw data (around 30 rows) is pasted on the next available row in column B manually. This raw data contains duplicate rows that have some updated column contents since the previous day. Column C contains the Job Number which is used to determine duplicates, and column A contains a sequential ID from 1 to n.

I need to create a function that finds duplicate rows based on column C, then uses the row with the largest Column A value to overwrite the data in the lowest Column A value so that the order of jobs numbers added to the sheet never changes once a job is in there.Image of sheet currently In the image; job 22,484 on row 4 would have due date updated to 10/5/22 and job data on row 5 would be removed. For each day's raw data there will be 20-25 duplicate entries.

The maximum rows used will be 5000 so the script can be inefficient if necessary.

Thanks! Connor

Here's a link to dummy sheet! (Thanks Mike)



Solution 1:[1]

Solution:

function myFunction() {

  const sheet = SpreadsheetApp.getActiveSpreadsheet()
                              .getSheetByName(`YOUR_SHEET`)
  
  let data = sheet.getRange(`B2:M`)
                  .getDisplayValues()
                  .filter(row => row.every(cell => cell.length))

  const duplicates = [...new Set(data.flatMap(i => i[1]))].map(i => data.filter(item => item[1] === i))
                                                          .filter(i => i.length > 1)
                                                          .map(i => i[i.length-1])

  duplicates.forEach(i => {
    const removeRow = data.splice(data.findIndex(item => item === i), 1)
    data[data.findIndex(item => item[1] === i[1])] = removeRow.flat()
  })

  sheet.getRange(`B2:M`).clear()
  sheet.getRange(2, 2, data.length, data[0].length).setValues(data)

}

This will get all non-empty rows of table data, find the duplicates, and replace the previous(old) entries with the new data.

Let me know if this works for you!

Commented:

function myFunction() {

  const sheet = SpreadsheetApp.getActiveSpreadsheet()
                              .getSheetByName(`YOUR_SHEET`)

  // Get the target range..
  let data = sheet.getRange(`B2:M`)
                  // As text.. (Avoids date issues)
                  .getDisplayValues() 
                  // ...and ignore blank rows.
                  .filter(row => row.every(cell => cell.length))

  // Get all unique `Job No`'s and replace their value in this array with the relevant rows from 'data'...
  const duplicates = [...new Set(data.flatMap(i => i[1]))].map(i => data.filter(item => item[1] === i))
                                                          // Keep all `Job No`'s rows with more than 1 entry..
                                                          .filter(i => i.length > 1)
                                                          // ...And keep only the most recent.
                                                          .map(i => i[i.length-1])

  // For each of these duplicates..
  duplicates.forEach(i => {
    // Remove the 'new' row..
    const removeRow = data.splice(data.findIndex(item => item === i), 1)
    // Replace the 'old' row..
    data[data.findIndex(item => item[1] === i[1])] = removeRow.flat()
  })

  sheet.getRange(`B2:M`).clear()
  sheet.getRange(2, 2, data.length, data[0].length).setValues(data)

}

Solution 2:[2]

Remove Dupes

function removeDupsOnColC() {
  const ss = SpreadsheetApp.getActive();
  const sh = ss.getSheetByName(`DEBUG`)
  let data = sh.getRange(`B2:M` + sh.getLastRow()).getDisplayValues();
  let ua = [];
  let oA = [];
  data.forEach((r,i) => {
    if(!~ua.indexOf(r[2])) {
      ua.push(r[2]);
      oA.push(r);
    }
  })
  sh.getRange(2, 2, oa.length, oa[0].length).setValues(oa)
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Cooper