'How do I add and use MNIST in Julia 1.6.6?
The code for Mohammad Nauman's excellent book shows this (for Julia 1.5.3):
using Flux, Statistics
using Flux.Data.MNIST
using Flux: onehotbatch
Which fails under Julai 1.6.6 with
UndefVarError: MNIST not defined
Stacktrace:
[1] eval
@ ./boot.jl:360 [inlined]
[2] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1116
So I try
] add MNIST
which gives
The following package names could not be resolved:
* MNIST (not found in project, manifest or registry)
If I try
using MNIST
it gives
ArgumentError: Package MNIST not found in current path:
- Run `import Pkg; Pkg.add("MNIST")` to install the MNIST package.
If I then try the recommended
import Pkg; Pkg.add("MNIST")
it gives
The following package names could not be resolved:
* MNIST (not found in project, manifest or registry)
The author's code also gives the same error under 1.6.6.
How can I use MNIST under Julia 1.6.6?
Solution 1:[1]
The MNIST dataset is available from the MLDatasets.jl package.
A lot of information is available in the package documentation: MNIST.
]add MLDatasets
using MLDatasets
# load training set
train_x, train_y = MNIST.traindata()
# load test set
test_x, test_y = MNIST.testdata()
To expand on the above and add some background information. I don't have the book so I can't check exactly what version of Flux is used but it is some version prior to v0.12.0 which is when the datasets were removed (see commit b78cd76) in favor of MLDatasets (relevant PR). Of course having a different Julia version does not prevent you from installing an older version of Flux. I would not recommend opting for an older version of Flux if this is the only issue you're facing. Up to date tutorials will be using MLDatasets and the Julia community in general tends to converge on a single package for a particular purpose.
To clarify the example above:
where you would before do:
train_x = MNIST.images(:train)
train_y = MNIST.labels(:train)
test_x = MNIST.images(:train)
test_y = MNIST.labels(:train)
you would now instead use the code above. The labels are identical in the two cases:
julia> train_x, train_y = MLDatasets.MNIST.traindata();
julia> Data.MNIST.labels(:train) == train_y
true
However, Flux.Data.MNIST.images(:train)
returns a Vector
of images (28x28 matrices with eltype
Gray{N0f8}
) while MLDatasets returns (more or less) a 3D tensor (28x28x60000). To get data identical to the one in Flux.Data.MNIST we need to split up the matrices of the tensor, turn them into images (Gray
elements), and transpose them.
julia> using ImageCore
julia> map(transpose, eachslice(Gray.(train_x); dims=3)) == Data.MNIST.images(:train)
true
If you decide that you prefer using an older version of Flux you could try v0.12.2 - v0.12.10. They are compatible with your Julia version and "still" have Flux.Data.MNIST
(the datasets were added back but marked as deprecated):
pkg> add Flux#v0.12.10
Solution 2:[2]
You can use MLDatasets using the following:
using Flux
using Flux: Data.DataLoader
using Flux: onehotbatch, onecold, crossentropy
using Flux: @epochs
using Statistics
using MLDatasets
# Load the data
x_train, y_train = MLDatasets.MNIST.traindata()
x_valid, y_valid = MLDatasets.MNIST.testdata()
# Add the channel layer
x_train = Flux.unsqueeze(x_train, 3)
x_valid = Flux.unsqueeze(x_valid, 3)
# Encode labels
y_train = onehotbatch(y_train, 0:9)
y_valid = onehotbatch(y_valid, 0:9)
# Create the full dataset
train_data = DataLoader(x_train, y_train, batchsize=128)
(Source: https://towardsdatascience.com/deep-learning-with-julia-flux-jl-story-7544c99728ca)
However, the discussion I have in the book/course about the shapes would not match since the structure of data has changed in MLDatasets package. If you want to follow that discussion in earnest, I suggest installing the version I'm using over there. You can make sense of it yourself too if you're willing to put in the time.
Disclaimer: I'm the author mentioned in the original post.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | desertnaut |
Solution 2 | recluze |