'Copy images from secured website to word (or other applications)

When you copy some HTML from a website that contains image tags and that image tags sources require authentication, the paste into another application that does not have those authentication credentials will not work.

I assumed, that the binary image data is copied when you hit CTRL+C, but this is not the case. Instead the image remains a image tag with a src attribute to the origin of the html.

I found a workaround, that google implemented in Google Drive and their other products, but this seems very hacky IMO: https://www.theverge.com/2015/6/23/8830977/google-photos-security-public-url-privacy-protected

Also this approach only happens to function, if the user has an active internet connection during CTRL+V (e.g. go to any wikipedia page, disable your network connection, Ctrl+A, Ctrl+C and Ctrl+V into any word document, and all images are missing).

I am currently using the javascript clipboard API (https://w3c.github.io/clipboard-apis/) to overwrite the image tags to include an external access token, when the user copies them for the first time. The external access tokens are then saved on the server side to enable a request without authentication to the image source.

The resulting HTML in the clipboard looks something like the following example:

<p>Lorem Ipsum Dolor sit amet <img src="http://example.com/images/SomeSecuredImage.jpg?securitytoken=ABCDEFGHIJK..." /></p>

Is anyone aware of other solutions for this problem, or is the "Google" approach the only one out there.

Thanks!

EDIT

I have some html documents and images secured by a login mechanism that my users can open in their web browsers after logging into our system. As long as they are logged in, they can load the html documents and images, but as soon as they are trying to select some text including images and copy it, they are not able to paste the contents including the images into another application like word, because word is not authenticated like the browser is, so only the text is copied.

I have already tried to debug the problem and validated, that word itself is making a request to fetch the images after hitting CTRL+V inside word.

So maybe someone knows a different solution than the mentioned random URL solution, thanks :-)



Solution 1:[1]

Google's puppeteer library may help you if i understand your question correctly. It can take screenshots of remote site DOM elements or it can take the whole page. Hope it helps.

https://gist.github.com/malyw/b4e8284e42fdaeceab9a67a9b0263743

For more features: https://developers.google.com/web/tools/puppeteer/

Solution 2:[2]

Using Data URLs

Data URLs can embed data directly into the img src property and be copied directly to the clipboard, which should alleviate needing to authenticate again. From the front-end perspective, the easiest way is to have your back-end serve your images as data URIs and everything should just copy and paste fine (though loading megabytes of encoded image data into HTML may result in a bad user experience). Also depending on your back-end, this might be challenging to implement. Instead, I'll go over front-end approaches here.

Front End Approaches

There doesn't seem to be a way to convert already loaded images into Data URLs, so you'll need to convert them or fetch them again. It seems that in both of these cases, the images you load need to be from the same-origin or cross-origin loadable (the Access-Control-Allow-Origin header should include the loading page's origin or permit everything via *). I'll assuming for use cases where authentication is already set up for images, this shouldn't be a problem. The examples here use Wikimedia images which allows for cross-origin loading.

Dynamically Converting Images with Canvases

You can draw same-origin/cross-origin-loaded images to canvases, and then get data URLs from canvases via the canvas.toDataURL method. The disadvantage to this approach is that you're converting the images again, creating potential image degradation or inefficient file sizes.

Because all of these actions are synchronous they can take place in the copy event handler. The disadvantage to this is if there are large/many images, these synchronous conversions may block the page, creating a bad user experience.

For this example, the selected images are synchronously copied to canvases, converted to Data URL jpegs (it could also be pngs), which then replace the cloned images' src. Try copying and pasting the images into an editor that takes HTML input:

document.body.addEventListener('copy', function (e) {
  e.preventDefault();
  var range = document.getSelection().getRangeAt(0);
  var fragment = range.cloneContents();
  var canvas = document.createElement('canvas');
  var ctx = canvas.getContext('2d');
  fragment.querySelectorAll('img').forEach(function (img) {
    canvas.width = img.width;
    canvas.height = img.height;
    ctx.drawImage(img, 0, 0);
    // can use toDataURL('image/png') for higher quality/size
    // or use the second argument to change jpeg quality/size
    var data = canvas.toDataURL('image/jpeg', 0.8);
    img.src = data;
  });
  var container = document.createElement('div');
  container.appendChild(fragment);
  e.clipboardData.setData('text/html', container.innerHTML);
  e.clipboardData.setData('text/plain', container.textContent);
});

var output = document.getElementById('output');
output.addEventListener('paste', function (e) {
  var data = e.clipboardData;
  e.preventDefault();
  output.value = data.getData('text/html');
});
<div style="display:inline-block; vertical-align:top"><div><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/Mona_Lisa.jpg/158px-Mona_Lisa.jpg" crossorigin></div><div style="max-width:158px">"Mona Lisa" by Leonardo da Vinci</div>
</div>
<div style="display:inline-block; vertical-align:top"><div><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/90/Monet_w1709.jpg/163px-Monet_w1709.jpg" crossorigin></div><div style="max-width:163px">Claude Monet, Water Lilies, W.1709</div>
</div>
<div>
  <textarea id="output" placeholder="HTML Paste Here"></textarea>
  <textarea placeholder="Text Paste Here"></textarea>
</div>

Asynchronously Using Fetch

Alternatively, you may want to get data URLs before copying even happens. You can fetch the images as blobs, and then convert them to data URLs via FileReader.toDataURL. The advantage to this approach is that it uses the raw data in the images, so there's no image conversion, so there isn't degradation and it doesn't block the page. It should happen relatively quickly since fetch can work off of the cache.

The disadvantage of this is that it happens asynchronously, which can come with complications (seen below). Due to the complications, you may want to fetch on demand or prefetch the resources beforehand.

Fetching on Demand

You can fetch and set data URLs using fetch and the asynchronous Clipboard API. The API, while supported by default in most browsers, needs to be explicitly enabled by the user in Firefox.

var output = document.getElementById('output');
output.addEventListener('paste', function (e) {
  var data = e.clipboardData;
  e.preventDefault();
  output.value = data.getData('text/html');
});

document.body.addEventListener('copy', async function (e) {
  document.getElementById('status').textContent = 'Not Ready To Paste';
  e.preventDefault();
  var range = document.getSelection().getRangeAt(0);
  var fragment = range.cloneContents();
  var promises = [...fragment.querySelectorAll('img')].map(function (img) {
    return fetch(img.src, {
      // may have to try different configurations here:
      // mode: 'cors',
      // credentials: 'include' // or same-origin
    }).then(function (result) {
      return result.blob();
    }).then(function (blob) {
      var reader = new FileReader();
      return new Promise(function (resolve) {
        reader.addEventListener('load', function (d) {
          resolve(reader.result);
        });
        reader.readAsDataURL(blob);
      });
    }).then(function (dataURL) {
      img.src = dataURL;
    });
  });
  await Promise.all(promises);
  var container = document.createElement('div');
  container.appendChild(fragment);
  await navigator.clipboard.write([
    new ClipboardItem({
      'text/plain': new Blob([ container.textContent ], { type: 'text/plain' }),
      'text/html': new Blob([ container.innerHTML ], { type: 'text/html' })
    })
  ]);
  document.getElementById('status').textContent = 'Ready To Paste';
});
<div id="status">Ready To Copy</div>
<div style="display:inline-block; vertical-align:top"><div><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/Mona_Lisa.jpg/158px-Mona_Lisa.jpg" crossorigin></div><div style="max-width:158px">"Mona Lisa" by Leonardo da Vinci</div>
</div>
<div style="display:inline-block; vertical-align:top"><div><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/90/Monet_w1709.jpg/163px-Monet_w1709.jpg" crossorigin></div><div style="max-width:163px">Claude Monet, Water Lilies, W.1709</div>
</div>
<div>
  <textarea id="output" placeholder="HTML Paste Here"></textarea>
  <textarea placeholder="Text Paste Here"></textarea>
</div>

Unlike the other approaches, it might need an active internet connection while copying (though fetch might use cache).

Prefetching

Alternatively, you don't need to use the asynchronous Clipboard API if you set the Data URLs beforehand. One approach would be to refetch all of the images and replace their src with the corresponding data url.

Here's an example of that approach:

var promises = [...document.querySelectorAll('img')].map(function (img) {
  return fetch(img.src, {
    // may have to try different configurations here:
    // mode: 'cors',
    // credentials: 'include' // or same-origin
  }).then(function (result) {
    return result.blob();
  }).then(function (blob) {
    var reader = new FileReader();
    return new Promise(function (resolve) {
      reader.addEventListener('load', function (d) {
        resolve(reader.result);
      });
      reader.readAsDataURL(blob);
    });
  }).then(function (dataURL) {
    img.src = dataURL;
  });
});
Promise.all(promises).then(function () {
  document.getElementById('status').textContent = 'Ready To Copy';
});

var output = document.getElementById('output');
output.addEventListener('paste', function (e) {
  var data = e.clipboardData;
  e.preventDefault();
  output.value = data.getData('text/html');
});
<div id="status">Not Ready To Copy</div>
<div style="display:inline-block; vertical-align:top"><div><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/Mona_Lisa.jpg/158px-Mona_Lisa.jpg" crossorigin></div><div style="max-width:158px">"Mona Lisa" by Leonardo da Vinci</div>
</div>
<div style="display:inline-block; vertical-align:top"><div><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/90/Monet_w1709.jpg/163px-Monet_w1709.jpg" crossorigin></div><div style="max-width:163px">Claude Monet, Water Lilies, W.1709</div>
</div>
<div>
  <textarea id="output" placeholder="HTML Paste Here"></textarea>
  <textarea placeholder="Text Paste Here"></textarea>
</div>

If you're using a modern framework, you probably could make a component that seamlessly and automatically does this.

Offline Use

As mentioned by the questioner, the additional advantage of using Data URLs is that once they're set, they don't need to be fetched again from the server so they may be available for offline use. The prefetch example above does need to initially refetch the images, but it can happen at page load. The fetch-on-demand example may need an active internet connection at the time of copying (though it might work using cache).

Solution 3:[3]

I just tried it and it works fine when you save your page as MHTML and then double click on it and copy the content from there. Word pastes images as images, not as a link.

In your browser select "file > save page as ..." then select "mhtml" format from dropdown list. It will save the whole page into one file. With images.

Tested on Word 2007, Vivaldi / Opera / Chrome browser, Windows.

Solution 4:[4]

Have you looked at browser plugins?

To solve your problem you still need to fetch the binary data. You can use some existing plugin or build your own.

Solution 5:[5]

In your browser select -> File > Save Page As > Select "mhtml" format from dropdown list. It will save the whole page into one file. With images.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 bgul
Solution 2
Solution 3 Flash Thunder
Solution 4 Eriks Klotins
Solution 5 Shubham Prakash