'How to train a model in SageMaker Studio with .train and .test extension dataset files?
I'm trying to implement ML models with Amazon SageMaker Studio, the thing is that the model that I want to implement is from hugging face and It uses a Dataset from CONLL Corpora.
Following the instructions from the Hugging Face documentation, I have to read a csv file with this instruction: train = pd.read_csv. But the problem comes with the dataset file extension because it's a .train and .test extension. The error I'm getting is: "ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 3"
Is there a way to convert .test files to csv files? Or how should I read these files extensions?
Links
Dataset: https://www.kaggle.com/nltkdata/conll-corpora
Model: https://huggingface.co/mrm8488/bert-spanish-cased-finetuned-ner
Solution 1:[1]
The dataset in your link seem to be tab separated, not comma separated.
You can read it using the right delimiter, like
df = pd.read_csv("<filename>", sep="\t")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | durga_sury |