I have an idea of writing a Haskell library that generates bash scripts. The library will be called GHB, which stands for Generate (from) Haskell (to) Bash (actually it is a joke name, because it differs only 1 from GHC).
Because writing bash script is error-prone and the conditioning / casing / loops are not very pleasant to write, I want to write a library that will be able to generate complicated bash scripts in a type-safe manner.
The library, as I imagine, will provide a set of combinators that will allow the user to generate bash scripts in a type-safe manner. The library will enforce best practices for writing bash scripts and will help the user avoid common pitfalls and errors.
There are some haskell libraries like Turtle
that allows you to do scripting directly in haskell, but that means you must have haskell compiler installed in the system. The idea of GHB is to generate bash scripts that can be run on any system.
Using a variable before it is defined, or using a variable that is not defined.
Using a variable that is defined in a different scope, for example, using a variable defined in a function in the global scope.
Enforce safe variable names
local variables use lowercase letters global variables use uppercase letters
Expand long parameter notation whenever possible
like rm --recursive --force
instead of rm -rf
Use the correct quoting style
Use single quotes for strings that do not contain variables
Use double quotes for strings that contain variables
Always Enforce
set -o errexit # Exit on error
set -o nounset # Exit if an unset variable is used
set -o pipefail # Exit if any command in a pipeline fails
Or we can let users to handle errors in a more systematic way.
Automatically examine if the command is available in the system
potentially: automatically install the command if it is not available in the system,
for this we need to know the package manager of the system.
Automatically add the shebang line to the script
Safe conditionals, and control structures, casing, loops, etc.
Variables will have a type, and the type will be checked at compile time, and in bash script, the type info will be erased, but will be inserted in the comments.
A command or a function has a type, represented in our type system.
common functions should be included in the library, their types will be checked and eventually rendered as a command in the bash script.
Let’s consider what types of commands we can have in bash:
A command can have some input values, strings, numbers, file paths or directory paths, etc.
A command may produce output in stdout, stderr, or files.
A command can have exit codes, with 0
meaning success and non-zero meaning failure.
A command taking no input or one string input, printing string in stdout or strerr, with return codes.
A command that can take flags with no associated arguments, or flags with associated arguments.
For example
ls -l
or ls --long
or ls --long --all
are examples of flags with no associated arguments. It has one input as a directory path.
find . -name "*.hs"
is an example of a flag with an associated argument, the associated argument to the flag -name
is *.hs
and the input to the command is a file path .
.
A command that reads input in stdin, and produces output in stdout or stderr.
For example, sort
command reads input from stdin and produces output in stdout or read from a file.
Commands that produce side effect only, for example modifying enviroment
export PATH=$PATH:/usr/local/bin
Bash script functions
Functions can take arguments $1
, $2
, etc.
Functions can produce output in stdout or stderr.
Functions can have return codes using return
statement.
Arguments and option flags
flags can be optional or required,
flags can have associated arguments or not, depending on the flag.
Stdin and Stdout
command < file.txt
command <<EOF
(...)
EOF
A command can produce output in stdout or stderr.
write to stdout command > file.txt
append command >> file.txt
write to stderr command 2> file.txt
write both stdout and stderr to the same file command > file.txt 2>&1
Pipes
A command can pass its output to another command using pipes.
command1 | command2
Typically return code 0
meaning success and non-zero meaning failure. Other return codes can depend on the command. Our library should encode some of the return types in its abstract syntax for common commands, and at the same time allow directly manipulating the return codes.
The return code is stored in the special variable $?
immediately after the command is executed, to read it we can do like
command
status=$? # can be omitted, you can use "if [ $? -eq 0 ]; then" instead.
if [ $status -eq 0 ]; then
echo "Success"
else
echo "Failure"
fi
# or something like this
if [ $? -ne 0 ]; then
echo "Failure"
fi
We want to write things monadically, and move the job of assigning variables to the library. We want to write things like
do
<- ls "-l" -- which should be a command that returns a string
str echo str
We want to have a GADT that represents the type of the command, and the type of the command will be checked at compile time.
Command can have a parsable type class, which can help us to parse the output of the command into structured data. The type class will have a type family inside that will represent the type of the output of the command.
class Parsable cmd where
type Parsed cmd
parse :: Stdout cmd String -> Maybe (Parsed cmd)
instance Parsable Ls where
type Parsed Ls = LinesOf FilePath
= undefined parse
Unfortunately these parsers will need to be compiled into bash as well, I wonder if there is a way to compile any monadic parser into a bash one.
Let’s review the above simplistic example to get an idea of what we are looking for.
The output of ls is stdout, the output should be given this type ‘Stdout String’
ls :: Params Ls -> Maybe DirPath -> Command (Stdout Ls String)
if we use cat str, it can be done by piping ls -l | cat
, but not with echo, because echo does not take input from stdin. This time we need to use variables, or xargs. We will consider using variables as a more systematic way.
Consider the following types for ls
and cat
:
echo :: String -> Command (Stdout Echo String)
cat :: Params Cat
-> [FilePath]
-> Maybe (Stdin String)
-> Command (Stdout Cat String)
(|) :: Command (Stdout cmda a)
-> (Stdin a -> Command (Stdout cmdb b))
-> Command (Stdout cmdb b)
Here Params a
is a type family or just some plain algebraic data type (change to ParamsLS
and ParamsCAT
for this design), that represents different type of parameters that a command can take.
A value is a string, array or number or some form of data that can be accessed and manipulated in bash. It could be literals Lit a
or variables Var name
.
data Value a
= Lit a
| Var String -- variable
A command is a piece of code (doesn’t have to be one command, can be multiple lines) that already filled all its parameters, when you run it, it will do something and produce the desired output prescribed by the type wrapped by the Command
type.
data Command o where
Ls :: Params Ls -> Value (Maybe DirPath) -> Command (Stdout LS String)
Echo :: Value String -> Command (Stdout ECHO String)
Cat :: Params CAT
-> Value [FilePath]
-> Maybe (Stdin a)
-> Command (Stdout CAT a)
CmdStdin :: (HasStdin cmd)
=> Params cmd
-> Stdin cmd -> Command (Stdout cmd a)
CmdNoStdin
:: Params cmd
-> Command (Stdout cmd a)
(|>) :: Command (Stdout cmda a)
-> (Stdin a -> Command (Stdout cmdb b))
-> Command (Stdout cmdb b)
Command
should be a ‘functor’, but should only work on functions that is implementable in bash.
A bash is a bash program, piece of code, function that can run or compose.
data Bash a where
The following is my first trial on implementing the GHB library. The code is not complete, and the design is probably not optimal, but it gives an idea of what I am trying to achieve owo!
{-# LANGUAGE GADTs, MultiParamTypeClasses, PolyKinds, TypeOperators, TypeFamilies, OverloadedStrings, FlexibleInstances, TypeApplications, DataKinds, ScopedTypeVariables, AllowAmbiguousTypes, DerivingVia #-}
-- | GHB is an EDSL for Haskell -> Bash scripting.
-- Author: Eiko
module GHB where
import Control.Monad.State
import Data.String
import qualified Data.Map as M
import Data.Functor.Identity
import Data.HList hiding (get)
data Value a where
Lit :: a -> Value a -- ^ a literal holding a value
Var :: VarName a -> Value a -- ^ variable name
FromCmd :: (KnownCommand cmd) => Command (Stdout cmd a) -> Value a -- ^ value from a command
newtype VarName a = VarName String
deriving (IsString, Ord, Eq) via String
newtype Path = Path String
deriving (IsString, Ord, Eq) via String
data Command o where
CmdStdin :: (HasStdin cmd i, EmbedValue i) => Params cmd -> Stdin i -> Command (Stdout cmd o)
CmdNoStdin :: Params cmd -> Command (Stdout cmd o)
Pipe :: (KnownCommand cmda) => Command (Stdout cmda a) -> (Stdin a -> Command (Stdout cmdb b)) -> Command (Stdout cmdb b)
(|>) :: (KnownCommand cmda) => Command (Stdout cmda a) -> (Stdin a -> Command (Stdout cmdb b)) -> Command (Stdout cmdb b)
|>) = Pipe
(
data Stdin a
= Stdin
| FromValue (Value a) -- should be 'a'
| FromFile (Value Path)
data Stdout cmd b = Stdout
data Output (a :: k)
data Or (a :: k) (b :: k) -- = Or (Output a) (Output b)
data Many (a :: k)
data Func (inp :: [k]) (out :: [k])
data BashStruct a where
Sequential :: BashStruct as -> BashStruct bs -> BashStruct (as ++ bs)
IfElse :: Value Bool -> BashStruct as -> BashStruct bs -> BashStruct '[Or as bs]
If :: Value Bool -> BashStruct as -> BashStruct '[Or as '[]]
While :: Value Bool -> BashStruct ls -> BashStruct '[Many ls]
Command :: KnownCommand cmd => Command (Stdout cmd a) -> BashStruct '[a]
Assign :: EmbedValue a => VarName a -> Value a -> BashStruct '[]
FunctionDec :: VarName (Func input output) -> BashStruct output -> BashStruct '[]
FunctionApp :: VarName (Func input output) -> FList Value input -> BashStruct output
functionCall :: Monad m
=> VarName (Func input output)
-> FList Value input
-> GHB m (Command (Stdout (Func input output) output))
= undefined
functionCall
trimString :: BashStruct '[]
= FunctionDec "trimString" $ Command (echo [Var "1"] |> cat nop)
trimString
(>>>) :: BashStruct as -> BashStruct bs -> BashStruct (as ++ bs)
>>>) = Sequential
(infixr 5 >>>
class Assignable a where
type AssignedType a
(@=) :: VarName (AssignedType a) -> a -> BashStruct '[]
instance (EmbedValue a, KnownCommand cmd) => Assignable (Command (Stdout cmd a)) where
type AssignedType (Command (Stdout cmd a)) = a
@= cmd = Assign vname $ FromCmd cmd
vname
type PureGHB a = GHB Identity a
type BashLine = String
ghb :: BashStruct a -> PureGHB [BashLine]
Sequential s1 s2) = (++) <$> ghb s1 <*> ghb s2
ghb (If cond sif) = do
ghb (<- insertIndent $ "if [" ++ embedValue cond ++ "]; then"
ifcondthen <- increaseIndent >> ghb sif
blockif <- decreaseIndent >> insertIndent "fi"
endif return $ ifcondthen : blockif ++ [endif]
IfElse cond sif selse) = do
ghb (<- insertIndent $ "if [" ++ embedValue cond ++ "]; then"
ifcondthen <- increaseIndent >> ghb sif
blockif <- decreaseIndent >> insertIndent "else"
else' <- increaseIndent >> ghb selse
blockelse <- decreaseIndent >> insertIndent "fi"
endif return $ ifcondthen : blockif ++ else' : blockelse ++ [endif]
While cond s) = do
ghb (<- insertIndent $ "while [" ++ embedValue cond ++ "]; do"
while <- increaseIndent >> ghb s
block <- decreaseIndent >> insertIndent "done"
endwhile return $ while : block ++ [endwhile]
Command cmd) = pure <$> insertIndent (protoCompiler cmd)
ghb (Assign vname val) = pure <$> insertIndent (embedVarAtAssignment vname ++ "=" ++ embedValue val)
ghb (FunctionDec fname body) = do
ghb (let funcname = embedVarAtAssignment fname
<- insertIndent $ "function " ++ funcname ++ " {"
funcdec <- increaseIndent >> ghb body
block <- decreaseIndent >> insertIndent "}"
endfunc return $ funcdec : block ++ [endfunc]
getIndent :: Monad m => GHB m Indent
= gets _indent
getIndent
increaseIndent :: Monad m => GHB m ()
= modify $ \bstate -> bstate { _indent = Indent $ unIndent (_indent bstate) + 1 }
increaseIndent
decreaseIndent :: Monad m => GHB m ()
= modify $ \bstate -> bstate { _indent = Indent $ unIndent (_indent bstate) - 1 }
decreaseIndent
insertIndent :: Monad m => BashLine -> GHB m BashLine
= do
insertIndent str Indent i <- getIndent
return $ replicate (indentUnit * i) ' ' ++ str
where indentUnit = 4
-- | This function declares a new variable name
newName :: Monad m => String -> GHB m (VarName a)
= do
newName vname BState currentMap position <- get
case M.lookup vname currentMap of
Just idmap ->
let mmaxid = M.lookupMax idmap in
case mmaxid of
Nothing -> do
let newvarName = vname ++ "_0"
$ BState (M.insert vname (M.insert 0 True idmap) currentMap) position
put return $ VarName newvarName
Just (maxid, _) -> do
let newvarName = vname ++ "_" ++ show (maxid + 1)
$ BState (M.insert vname (M.insert (maxid + 1) True idmap) currentMap) position
put return $ VarName newvarName
Nothing -> do
let newvarName = vname ++ "_0"
$ BState (M.insert vname (M.singleton 0 True) currentMap) position
put return $ VarName newvarName
type GHB m a = StateT BState m a
-- this is a monad where we can keep track of the state of the bash script, i.e. declarations of variables and functions
type (#>) = M.Map
infixr 5 #>
type InScope = Bool
data BState = BState { _vars :: String #> Int #> InScope, _indent :: Indent }
newtype Indent = Indent { unIndent :: Int }
data Position = Position { row :: Int, col :: Int }
instance IsString (Value String) where fromString = Lit
instance IsString (Value (Maybe String)) where
= Lit Nothing
fromString [] = Lit $ Just xs
fromString xs
class HasStdin cmd i
instance HasStdin CAT String
data CAT
data LS
data ECHO a
data RSYNC
type family Params cmd
type instance Params CAT = Value (Maybe String)
type instance Params LS = Value (Maybe String)
type instance Params (ECHO a) = [Value a] -- echo has a mandatory argument
--type instance Params RSYNC =
ls :: Params LS -> Command (Stdout LS String)
= CmdNoStdin @LS
ls
cat :: Params CAT -> Stdin String -> Command (Stdout CAT String)
= CmdStdin @CAT
cat
echo :: forall a. Params (ECHO a) -> Command (Stdout (ECHO a) a)
= CmdNoStdin @(ECHO a)
echo
class KnownCommand cmd where
applyCommand :: Params cmd -> String
instance KnownCommand LS where
Lit Nothing) = "ls"
applyCommand (= "ls " ++ embedValue v
applyCommand v
instance KnownCommand CAT where
Lit Nothing) = "cat"
applyCommand (= "cat " ++ embedValue v
applyCommand v
instance EmbedValue a => KnownCommand (ECHO a) where
= "echo " ++ unwords (embedValue <$> v)
applyCommand v
class EmbedLit a where
embedLit :: a -> String
instance EmbedLit String where
= "\"" ++ str ++ "\""
embedLit str
class EmbedValue a where
embedValue :: Value a -> String
embedVarAtApplication :: VarName a -> String
VarName vname) = "${" ++ vname ++ "}"
embedVarAtApplication (
embedVarAtAssignment :: VarName a -> String
VarName vname) = vname
embedVarAtAssignment (
instance EmbedValue String where
Lit a) = embedLit a
embedValue (Var vname) = embedVarAtApplication vname
embedValue (FromCmd cmd) = slice $ protoCompiler cmd
embedValue (
instance EmbedValue Path where
Lit (Path path)) = embedLit path
embedValue (Var vname) = embedVarAtApplication vname
embedValue (FromCmd cmd) = slice $ protoCompiler cmd
embedValue (
instance EmbedValue (Maybe Path) where
Lit (Just (Path path))) = embedLit path
embedValue (Lit Nothing) = ""
embedValue (Var vname) = embedVarAtApplication vname
embedValue (FromCmd cmd) = slice $ protoCompiler cmd
embedValue (
instance EmbedValue (Maybe String) where
Lit (Just str)) = embedLit str
embedValue (Lit Nothing) = ""
embedValue (Var vname) = embedVarAtApplication vname
embedValue (FromCmd cmd) = slice $ protoCompiler cmd
embedValue (
instance EmbedValue Bool where
Lit True) = "true"
embedValue (Lit False) = "false"
embedValue (Var vname) = embedVarAtApplication vname
embedValue (FromCmd cmd) = slice $ protoCompiler cmd
embedValue (
slice :: String -> String
= "$(" ++ str ++ ")"
slice str
nop :: Value (Maybe a)
= Lit Nothing
nop
nos :: Stdin a
= Stdin
nos
protoCompiler :: forall cmd a. (KnownCommand cmd) => Command (Stdout cmd a) -> String
CmdStdin p Stdin) = applyCommand @cmd p
protoCompiler (CmdStdin p (FromValue vstdin@(Lit _))) = applyCommand @cmd p ++ " <<EOF\n" ++ embedValue vstdin ++ "\nEOF"
protoCompiler (CmdStdin p (FromValue (Var vn))) = protoCompiler (echo [Var vn] |> CmdStdin @cmd p)
protoCompiler (CmdStdin p (FromValue (FromCmd cmd))) = protoCompiler (cmd |> CmdStdin @cmd p)
protoCompiler (CmdStdin p (FromFile vfilepath)) = applyCommand @cmd p ++ " < " ++ embedValue vfilepath
protoCompiler (CmdNoStdin p) = applyCommand @cmd p
protoCompiler (Pipe c1 c2) = protoCompiler c1 ++ " | " ++ protoCompiler (c2 Stdin)
protoCompiler (
= Var "HOME"
home
program :: Command (Stdout CAT String)
= ls home |> cat nop |> cat nop
program
= toBashLines programb
test
toBash :: PureGHB (BashStruct a) -> String
= unlines . toBashLines
toBash
toBashLines :: PureGHB (BashStruct a) -> [BashLine]
= runIdentity $ evalStateT (ghb =<< prog) (BState M.empty (Indent 0))
toBashLines prog
varJust :: VarName a -> Value (Maybe a)
VarName vname) = Var (VarName vname)
varJust (
programb :: PureGHB (BashStruct '[String])
= do
programb <- replicateM 10 $ newName "lshome"
lshome <- replicateM 10 $ newName "catlshome"
catlshome return $ foldr1 (>>>) (map (@= ls (Var "HOME")) lshome)
>>> foldr1 (>>>) (zipWith (\v a -> v @= echo @String [Var a]) catlshome lshome)
>>> Command (echo $ Var <$> catlshome)