Command Line Interface
The command line interface (CLI) is the primary way of using
mltype
. After installation, one can use the entrypoint
mlt
that is going to be in the path.
$ mlt
Usage: mlt [OPTIONS] COMMAND [ARGS]...
Tool for improving typing speed and accuracy
Options:
--help Show this message and exit.
Commands:
file Type text from a file.
ls List all language models
random Sample characters randomly from a provided vocabulary
raw Provide text manually
replay Compete against a past performance
sample Sample text from a language
train Train a language
Note that mltype
uses the folder ~/.mltype
(in the home
directory) for storing all relevant data. See below the usual structure.
- .mltype/
- config.ini
- checkpoints/
- a/ # training checkpoints of model a
- b/ # training checkpoints of model b
- languages/
- a # some model
- b # some other model
...
- logs/
..
file
Type random (or fixed) lines from a text file. This command has two main modes:
Random lines - Select random consecutive lines. One needs to specify
--n-lines
and optionally therandom-state
(for reproducibility).Fixed lines - One needs to specify
--start-line
and--end-line
.
Arguments
PATH
- Path to the text file to read from
Options
-e, --end-line
INTEGER
- The end line of the excerpt to use. Needs to be used together with start-line.-f, --force-perfect
- All characters need to be typed correctly-i, --instant-death
- End game after the first mistake-l, --n-lines
INTEGER
- Number of consecutive lines to be selected at random. Cannot be used together with start-line and end-line.-o, --output-file
PATH
- Path to where to save the result file-r, --random-state
INTEGER
-s, --start-line
INTEGER
- the start line of the excerpt to use. needs to be used together with end-line.-t, --target-wpm
INTEGER
- The desired speed to be shown as a guide-w, --include-whitespace
- Include whitespace characters.
Examples
Let us first create a text file
echo $'zeroth\nfirst\nsecond\nthird\nfourth\nfifth\nsixth' > text.txt
cat text.txt
zeroth
first
second
third
fourth
fifth
sixth
To select contiguous lines randomly, one can to specify -l, --n_lines
representing the number of lines to use.
mlt file -l 2 text.txt
Which would open the typing interface with 2 random contiguous lines
second third
The other option would be to use the deterministic mode and select the starting and ending line manually
mlt file -s 0 -e 3 text.txt
zeroth first second
As multiple commands, one can specify a target speed and an output file. Note that we follow the Python convention - line counting starts from zero and the intervals contain the starting line but not the ending one.
Note that one can keep the whitespace characters (including newlines)
in the text by adding the -w, --include_whitespace
option
mlt file -l 2 -w text.txt
second
third
ls
List available language models. One can use them with sample.
Please check the official github to download pretrained models - mltype github.
Note
mlt ls
simply lists all the files present
in ~.mltype/languages
.
Examples
mlt ls
python
some_amazing_model
wikipedia
random
Generate random sequence of characters based on provided counts. The absolute counts are converted to relative counts (probability distribution) that we sample from.
Note
mlt random
samples characters independently unlike
mlt sample
which conditions on previous characters.
Arguments
CHARACTERS
- Characters to include in the vocabulary. The higher the number of occurances of a given character the higher the probabilty of this character being sampled.
Options
-f, --force-perfect
- All characters need to be typed correctly-i, --instant-death
- End game after the first mistake-n, --n-chars
INTEGER
- Number of characters to sample-o, --output-file
PATH
- Path to where to save the result file-t, --target-wpm
INTEGER
- The desired speed to be shown as a guide
Examples
Let’s say we want to practise typing of digits. However, we would like to spend more time on 5’s and 6’s since they are harder.
mlt random "123455556666789 "
This would give us something like this.
546261561 3566 53 5496 556659554 435 1386559569 5 85641553465118589
We see that the most frequent characters are 5’s, 6’s and spaces.
raw
Provide text manually.
Arguments
TEXT
- Text to be transfered to the typing interface
Options
-f, --force-perfect
- All characters need to be typed correctly-i, --instant-death
- End game after the first mistake-o, --output-file
PATH
- Path to where to save the result file-r, --raw-string
- If active, then newlines and tabs are not seen as special characters-t, --target-wpm
INTEGER
- The desired speed to be shown as a guide
Examples
Let’s say we have some text in the clipboard that we just paste and type. Additionally, we want to see the 80 word per minute (WPM) marker. Lastly, no errors are acceptable—instant death mode.
mlt raw -i -t 80 "Hello world I will write you quickly"
Hello world I will write you quickly
replay
Play against a past performance. To save a past
performance one can use the option -o, --output_file
of the following
commands
Arguments
REPLAY_FILE
- Past performance to play against
Options
-f, --force-perfect
- All characters need to be typed correctly-i, --instant-death
- End game after the first mistake-t, --target-wpm
INTEGER
- The desired speed to be shown as a guide-w, --overwrite
PATH
- Overwrite in place if faster
Examples
We ran mlt sample ... -o replay_file
and we are not particularly happy
about the performance. We would like to replay the same text and try to
improve our speed. In case we do, we would like the replay_file
to be
updated automatically (using the -w, --overwrite
option).
mlt replay -w replay_file
Some text we already typed before.
sample
Generate text using a character-level language model.
Note
As opposed to mlt random
, the mlt sample
command
is taking into consideration all the previous characters and
therefore could generate more realistic text.
To see all the available models use ls. Please check the official github to download pretrained models - mltype github.
Arguments
MODEL_NAME
- Name of the language model
Options
-f, --force-perfect
- All characters need to be typed correctly-i, --instant-death
- End game after the first mistake-k, --top-k
INTEGER
- Consider only the top k most probable characters-n, --n-chars
INTEGER
- Number of characters to generate-o, --output-file
PATH
- Path to where to save the result file-r, --random-state
INTEGER
- Random state for reproducible results-s, --starting-text
TEXT
- Initial text used as a starting condition-t, --target-wpm
INTEGER
- The desired speed to be shown as a guide-v, --verbose
Show progressbar when generating text
Examples
We want to practise typing Python without having to worry about having real
source code. Assuming we have a decent language model for Python (see
train) called amazing_python_model
then we can do the following
mlt sample amazing_python_model
spatial_median(X, method="lar", call='Log', Cov']) glm.fit(X, y) assert_all
close(ref_no_encoded_c
Maybe we would like to give the model some initial text and let it complete it for us.
mlt sample -s "@pytest.mark.parametrize" amazing_python_model
@pytest.mark.parametrize('solver', ['sparse_cg', 'sag', 'saga'])
@pytest.mark.parametrize('copy_X', ['not a number', -0.10]]
train
Train a character-level language model. The trained model can then be used with sample.
In the background, we use an LSTM and feedforward network architecture to achieve this task. The user can set most of the important hyperparameters via the CLI options. Note that one can train without a GPU, however, to get access to bigger networks and faster training (~minutes/hours) GPUs are recommended.
Arguments
PATH_1
,PATH_2
, … - Paths to files or folders containing text to be trained onMODEL_NAME
- Name of the trained model
Options
-b, --batch-size
INTEGER
- Number of samples in a batch-c, --checkpoint-path
PATH
- Load a checkpoiont and continue training it-d, --dense-size
INTEGER
- Size of the dense layer-e, --extensions
TEXT
- Comma-separated list of allowed extensions-f, --fill-strategy
TEXT
- Either zeros or skip. Determines how to deal with out of vocabulary characters-g, --gpus
INTEGER
- Number of gpus. In not specified, then none. If -1, then all.-h, --hidden_size
INTEGER
- Size of the hidden state-i, --illegal-chars
TEXT
- Characters to exclude from the vocabulary.-l, --n-layers
:code`INTEGER` - Number of layesr in the recurrent network-m, --use-mlflow
- Use MLFlow for logging-n, --max-epochs
INTEGER
- Maximum number of epochs-o, --output-path
PATH
- Custom path where to save the trained models and logging details. If not provided it defaults to ~/.mltype.-s, --early-stopping
- Enable early stopping based on validation loss-t, --train-test-split
FLOAT
- Train test split - value between (0, 1)-v, --vocab-size
INTEGER
- Number of the most frequent characters to include in the vocabulary-w, --window-size
INTEGER
- Number of previous characters to consider for prediction
Examples
Let’s assume we have a book in fulltext saved in the book.txt
file. Our
goal would be to train a model that learns the language used in this book
and is able to sample new pieces of text that resemble the original.
See below a list of hyperparameters that work reasonably well and the training can be done in a few hours (on a GPU)
--batch-size
128--dense-size
1024--early-stopping
--gpus
1--hidden-size
512--max-epochs
10--n-layers
3--vocab-size
70--window-size
100
So overall the commands looks like
mlt train book.txt cool_model -n 3 -s -g 1 -b 128 -l 3 -h 512 -d 1024 -w 100 -v 80
During the training, one can see progress bars and the training and
validation loss (using pytorch-lightning
in the background).
Once the training is done, the best model (based the validation loss)
will be stored in ~/.mltype/languages/cool_model
.
There are several important customizatons that one should be aware of.
Using MLflow
If one wants to get more training progress information theere is a flag
--use-mlflow
(requiring mlflow
being installed). To launch
the ui run the following commands
cd ~/.mltype/logs
mlflow ui
Multiple files
mlt train
supports training from multiple files and folders.
This is really useful if we want to recursively create a training
set of all files in a given folder (e.g. github repository). Additionally,
one can use the --extensions
to control what files are considered
when traversing a folder.
mlt train main.py folder_with_a_lot_of_files model --extensions ".py"
The above command will create a training set out of all files inside
of the folder_with_a_lot_of_files
folder having the
“.py” suffix and also the main.py.
Excluding undesirable characters
If the input files contain some characters that we do not want the model
to have in its vocabulary, we can simply use the --illegal-chars
option. Internally, when an out of vocabulary character is encounter, there
are two strategies to handle this (controled via --fill-strategy
)
zeros - vector of zeros is used
skip - only consider samples that do not have out of vocabulary characters anywhere in their window
mlt train book.txt cool_model --illegal-chars "~{}`[]"
Configuration file
mltype
supports a configuration file that can be used for the following
tasks.
Setting reasonable defaults for any of the CLI commands
Defining custom parameters that cannot be set via the CLI
The configuration file is optional and one does not have to create it. By default
it should be located under ~/.mltype/config.ini
. One can also pass it
dynamically via the --config
option available for all commands.
See below an example configuration file.
[general]
models_dir = /home/my_models
color_default_background = terminal
color_wrong_foreground = yellow
[sample]
# one needs to use underscores instead of hyphens
n_chars = 500
target_wpm = 70
[raw]
instant_death = True
General section
The general
section can be used for defining special parameters
that cannot be set via the options of the CLI. Below is a complete list
of valid parameters.
models_dir
: Alternative location of the language models. The default directory is~/.mltype/languages
. It influences the behavior ofls
andsample
.color_default_background
: Background color of a default character. Note that it is either the character that has not been typed yet or that was backspaced (error correction).color_default_foreground
: Foreground (font) color of a default charactercolor_correct_background
: Background color of a correct charactercolor_correct_foreground
: Foreground color of a correct charactercolor_wrong_background
: Background color of wrong charactercolor_wrong_foreground
: Foreground color of a wrong charactercolor_replay_background
: Background color of a replay charactercolor_replay_foreground
: Foreground color of a replay charactercolor_target_background
: Background color of a target charactercolor_target_foreground
: Foreground color of a target character
Note
Available colors
terminal
- the color is inherited from the terminalblack
red
green
yellow
blue
magenta
cyan
white
Other sections
All the other sections are identical to the commands names, that is
file
ls
random
raw
replay
sample
train
Note that if the same option is specified both in the configuartion file and the CLI option the CLI value will have preference.
Note
Formatting rules
The section names and parameter names are case insensitive
One needs to use underscores instead of hyphens