'paramiko - slow sftp transfer speed compared with system rsync/sftp/scp

I have noticed that I can't get the same transfer speed when performing a get or put with paramiko's sftp.

Across our gigabit network from our mac mini server (running Mac os 10.12.6) a file transfer via rsync/sftp/scp/finder sustains around 95-100MB/sec. If I use paramiko's sftp.get I see it reaches a maximum of 25MB/sec.

I was using paramiko 1.17 and updated to 2.3.1 but am seeing pretty much the same speed.

Any ideas what could be causing the limitation?

Thanks!

Adam

python paramiko

Solution 1:^[1]

I ran into the same problem and implemented a few suggestions that others made. There are three things that can be done:

Increase the buffer size in your transport.

  transport = paramiko.Transport(ftp_host, ftp_port)
  transport.default_window_size = 4294967294 # 2147483647
  transport.packetizer.REKEY_BYTES = pow(2, 40)
  transport.packetizer.REKEY_PACKETS = pow(2, 40)

Perform a read ahead prior to getting the file.

 ftp_file = ftp_conn.file(file_name, "r")  
 ftp_file_size = ftp_file.stat().st_size 
 ftp_file.prefetch(ftp_file_size)
 ftp_file.set_pipelined()  
 ftp_file_data = ftp_file.read(ftp_file_size)

The other thing you can do when transferring larger files is implementing "chunks". This splits the files into smaller pieces that are transferred individually. I have only tested this with a transfer to s3.

 chunk_size = 6000000 #6 MB
 chunk_count = int(math.ceil(ftp_file_size / float(chunk_size)))
 multipart_upload = s3_conn.create_multipart_upload(Bucket=bucket_name, Key=s3_key_val)
 parts = []
 for i in range(chunk_count):
     print("Transferring chunk {}...".format(i + 1), "of ", chunk_count)

     start_time = time.time()
     ftp_file.prefetch(chunk_size * (i+1) # This statement is where the magic was to keep speeds high.
     chunk = ftp_file.read(int(chunk_size))
     part = s3_conn.upload_part(
         Bucket=bucket_name,
         Key=s3_file_path,
         PartNumber=part_number,
         UploadId=multipart_upload["UploadId"],
         Body=chunk
     )
     end_time = time.time()
     total_seconds = end_time - start_time
     print("speed is {} kb/s total seconds taken {}".format(math.ceil((int(chunk_size) / 1024) / total_seconds), total_seconds))
     part_output = {"PartNumber": i, "ETag": part["ETag"]}
     parts.append(part)
 print("Chunk {} Transferred Successfully!".format(i + 1))

 part_info = {"Parts": parts}
 s3_conn.complete_multipart_upload(
     Bucket=bucket_name,
     Key=s3_key_val,
     UploadId=multipart_upload["UploadId"],
     MultipartUpload=part_info
 )

The important part while processing the chunck is the ftp_file.prefetch(chunk_size * (i+1)) which reads ahead incrementally further with each loop.

After implementing all of this, downloads went from 200 kBps to 5 MBps (the max tunnel speed).

In a later iteration of this code I ran into problems with the garbage collection from paramiko. I resolved them by removing the line:

ftp_file.set_pipelined()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'paramiko - slow sftp transfer speed compared with system rsync/sftp/scp

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]