'mount_workspace_dir notebook magic not working in EMR Studio
In an EMR Studio Python3 notebook, I execute the following:
%mount_workspace_dir .
And receive the following error:
UsageError: Line magic function `%mount_workspace_dir` not found.
I setup the EMR Cluster for studio using a Cloud Formation template that is accessible to Studio via Service Catalog. The Cloud Formation template specifies a bootstrap script that installs s3fs-fuse. The template also specifies a step to be executed when the cluster launches that installs emr-notebooks-magics using pip.
When the cluster launches, I execute the above %mount_workspace_dir command and receive the indicated error. I tried restarting the kernel as well using the Kernel->Restart Kernel option from the menu.
Here is the Cloud Formation template (with substitutions for subnet and bucket names):
---
AWSTemplateFormatVersion: 2010-09-09
Parameters:
SubnetId:
Type: "String"
Resources:
EmrCluster:
Type: AWS::EMR::Cluster
Properties:
Applications:
- Name: Spark
- Name: Livy
- Name: JupyterEnterpriseGateway
- Name: Hive
- Name: Presto
EbsRootVolumeSize: '50'
Name: !Join ['-', ['emr-studio-', !Select [4, !Split ['-', !Select [2, !Split ['/', !Ref AWS::StackId]]]]]]
JobFlowRole: emr-studio-instance-role
ServiceRole: EMR_DefaultRole
ReleaseLabel: "emr-6.3.0"
VisibleToAllUsers: true
LogUri:
Fn::Sub: 's3://<my-bucket>/'
Instances:
TerminationProtected: false
Ec2SubnetId: '<my-subnet>'
MasterInstanceGroup:
InstanceCount: 1
InstanceType: "m5.xlarge"
BootstrapActions:
- Name: Auto-Termination
ScriptBootstrapAction:
Path: "s3://<my-bucket>/scripts/bootstrap-actions/install-s3fs-fuse.sh"
Steps:
- Name: Enable-Notebooks-Magics
ActionOnFailure: CONTINUE
HadoopJarStep:
Jar: command-runner.jar
Args:
- "sudo"
- "/mnt/notebook-env/bin/pip"
- "install"
- "emr-notebooks-magics"
Outputs:
ClusterId:
Value:
Ref: EmrCluster
Description: The ID of the EMR Cluster
Here is the content of the install-s3fs-fuse.sh script:
sudo amazon-linux-extras install epel -y
sudo yum install s3fs-fuse -y
I also tried with EMR 6.5.0.
Is there a step that I'm missing?
Solution 1:[1]
It appears there is a bug in the setup.py script of emr-notebooks-magics where it does not copy the 001-setup-emr-notebook-magics.py script to the correct location. I needed to add the following steps to make it work:
Steps:
- Name: Enable-Notebooks-Magics
ActionOnFailure: CONTINUE
HadoopJarStep:
Jar: command-runner.jar
Args:
- "sudo"
- "/mnt/notebook-env/bin/pip3"
- "install"
- "emr-notebooks-magics"
- Name: Copy-Magics-Script
ActionOnFailure: CONTINUE
HadoopJarStep:
Jar: command-runner.jar
Args:
- "sudo"
- "cp"
- "/mnt/notebook-env/bin/001-setup-emr-notebook-magics.py"
- "/home/emr-notebook/.ipython/profile_default/startup/"
Solution 2:[2]
You can update the EMR step script to install the package as emr-notebook user so that start up file is copied to the right directory.
sudo -u emr-notebook /mnt/notebook-env/bin/pip install emr-notebooks-magics
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | dennislloydjr |
Solution 2 | cyn0 |