'Parsing file in Haskell
i have a file, and i would like to parse it in a structure. the file looks like this:
(0,0) (33,18,109)
(0,1) (33,18,109)
(0,2) (33,21,109)
(0,3) (33,21,112)
(0,4) (33,25,112)
(0,5) (33,32,112)
(1,0) (33,18,109)
(1,1) (35,18,109)
(1,2) (35,21,109)
(1,3) (38,21,112)
and my structure look like that
data Pixel = Pixel { point::(Int, Int),
color::(Int, Int, Int) } deriving Show
I have eard about optparser, but i don't know how to use it i try something with pattern matching but it doesn't work...
thanks!
Solution 1:[1]
You could write your own Read instance to parse your data to Pixel:
{-# LANGUAGE TypeApplications #-}
module Main where
import qualified Data.Text as T
data Pixel = Pixel {
point::(Int, Int)
, color::(Int, Int, Int)
} deriving (Show)
instance Read Pixel where
readsPrec _ pixelRaw =
let makeMatch = (\(p:c:xs) -> (p,c)) $ words pixelRaw
point' = read @(Int,Int) $ fst makeMatch
color' = read @(Int, Int, Int) $ snd makeMatch
in [(Pixel point' color', "")]
main :: IO ()
main = do
fileContent <- map ((read @Pixel) . T.unpack) . T.splitOn (T.pack "\n") <$> (T.pack <$> readFile "input.txt")
mapM_ print fileContent
Solution 2:[2]
For this particular file format, @ThomasMeyer's solution using read
is reasonable. However, if you want to program in Haskell, it's practically mandatory that you learn how to use a monadic parser library like Parsec (or Megaparsec, Attoparsec, etc., or even the base library module Text.ParserCombinators.ReadP
). This will allow you to write complex, flexible parsers to parse just about anything.
Here's how to write a Parsec parser for your file format. Start with a few preliminaries, the imports plus your data type definition:
import Text.Parsec
import Text.Parsec.String
data Pixel = Pixel
{ point :: (Int, Int)
, color :: (Int, Int, Int)
} deriving (Show)
Your file contains a list of pixels, so we'll write a parser for that first:
file :: Parser [Pixel]
file = many pixel
This says that a file can be parsed into a list of pixels [Pixel]
by "many" (zero or more) applications of the pixel
parser.
The pixel
parser is more complex. It parses a single line into a Pixel
:
pixel :: Parser Pixel
pixel = Pixel <$> pPoint <* space <*> pColor <* newline
This parser is written in so-called "applicative" form, much like a Haskell function call with some extra applicative operators <$>
and <*>
. Specifically, the Pixel <$>
part of the expression applies the Pixel
constructor to arguments parsed by parsers: the pPoint
parser that parses something of the form (1,2)
and the pColor
parser that parses something of the form (1,2,3)
. We can also intersperse these argument-generating parsers with "extra" parsers that parse additional syntax, like the space
between the point and color, and the newline
at the end. Note the use of <*
before these "extra" parsers and <*>
before the argument parser pColor
. If you insert extra parentheses to show the order of application of these binary operators, the <
and >
characters in the operator point to the parts that get "kept" when calculating the final result:
(((Pixel <$> pPoint) <* space) <*> pColor) <* newline
^^^^^ ^^^^^^ ^^^^^ ^^^^^^ ^^^^^^^
keep keep drop keep drop
The final result of applying this parser is:
Pixel whatever_is_parsed_by_pPoint whatever_is_parsed_by_pColor
The pPoint
parser parses a pair of integers between parentheses, and I've shown the parts that get "kept" in producing the final result.
pPoint :: Parser (Int, Int)
pPoint = (,) <$ char '(' <*> int <* char ',' <*> int <* char ')'
-- ^^^^ ^^^^ ^^^^
-- keep keep keep
The result is (,) first_parsed_int second_parsed_int
which uses the pair constructor (,)
to construct a pair of integers.
The pColor
parser is similar:
pColor :: Parser (Int, Int, Int)
pColor = (,,) <$ char '(' <*> int <* char ',' <*> int <* char ',' <*> int <* char ')'
The int
parser parses one or more digit characters into an Int
:
int :: Parser Int
int = read <$> many1 digit
The complete program, with a main
driver, looks like this:
import Text.Parsec
import Text.Parsec.String
data Pixel = Pixel
{ point :: (Int, Int)
, color :: (Int, Int, Int)
} deriving (Show)
file :: Parser [Pixel]
file = many pixel
pixel :: Parser Pixel
pixel = Pixel <$> pPoint <* space <*> pColor <* newline
pPoint :: Parser (Int, Int)
pPoint = (,) <$ char '(' <*> int <* char ',' <*> int <* char ')'
pColor :: Parser (Int, Int, Int)
pColor = (,,) <$ char '(' <*> int <* char ',' <*> int <* char ',' <*> int <* char ')'
int :: Parser Int
int = read <$> many1 digit
main :: IO ()
main = do
txt <- getContents
case parse file "(stdin)" txt of
Left err -> error $ "bad parse: " ++ show err
Right ps -> print ps
and it parses your input like so:
$ runghc PixelParser.hs <pixelparser.in
[Pixel {point = (0,0), color = (33,18,109)},Pixel {point = (0,1),
color = (33,18,109)},Pixel {point = (0,2), color = (33,21,109)},
Pixel {point = (0,3), color = (33,21,112)},Pixel {point = (0,4),
color = (33,25,112)},Pixel {point = (0,5), color = (33,32,112)},
Pixel {point = (1,0), color = (33,18,109)},Pixel {point = (1,1),
color = (35,18,109)},Pixel {point = (1,2), color = (35,21,109)},
Pixel {point = (1,3), color = (38,21,112)}]
For some more examples/tutorials, I can recommend: Jake Wheat's Intro to Parsing with Parsec in Haskell, Two Wrongs / Parser Combinators for parsing using ReadP
, and the "Using Parsec" chapter of Real World Haskell.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Thomas Meyer |
Solution 2 | K. A. Buhr |