Command Line Interface¶
The command line interface (CLI) is the primary way of using
mltype
. After installation, one can use the entrypoint
mlt
that is going to be in the path.
$ mlt
Usage: mlt [OPTIONS] COMMAND [ARGS]...
Tool for improving typing speed and accuracy
Options:
--help Show this message and exit.
Commands:
file Type text from a file.
ls List all language models
random Sample characters randomly from a provided vocabulary
raw Provide text manually
replay Compete against a past performance
sample Sample text from a language
train Train a language
Note that mltype
uses the folder ~/.mltype
(in the home
directory) for storing all relevant data. See below the usual structure.
- .mltype/
- checkpoints/
- a/ # training checkpoints of model a
- b/ # training checkpoints of model b
- languages/
- a # some model
- b # some other model
...
- logs/
..
file¶
Type random (or fixed) lines from a text file. This command has two main modes:
Random lines - Select random consecutive lines. One needs to specify
--n-lines
and optionally therandom-state
(for reproducibility).Fixed lines - One needs to specify
--start-line
and--end-line
.
Arguments¶
PATH
- Path to the text file to read from
Options¶
-e, --end-line
INTEGER
- The end line of the excerpt to use. Needs to be used together with start-line.-f, --force-perfect
- All characters need to be typed correctly-i, --instant-death
- End game after the first mistake-l, --n-lines
INTEGER
- Number of consecutive lines to be selected at random. Cannot be used together with start-line and end-line.-o, --output-file
PATH
- Path to where to save the result file-r, --random-state
INTEGER
-s, --start-line
INTEGER
- the start line of the excerpt to use. needs to be used together with end-line.-t, --target-wpm
INTEGER
- The desired speed to be shown as a guide-w, --include-whitespace
- Include whitespace characters.
Examples¶
Let us first create a text file
echo $'zeroth\nfirst\nsecond\nthird\nfourth\nfifth\nsixth' > text.txt
cat text.txt
zeroth
first
second
third
fourth
fifth
sixth
To select contiguous lines randomly, one can to specify -l, --n_lines
representing the number of lines to use.
mlt file -l 2 text.txt
Which would open the typing interface with 2 random contiguous lines
second third
The other option would be to use the deterministic mode and select the starting and ending line manually
mlt file -s 0 -e 3 text.txt
zeroth first second
As multiple commands, one can specify a target speed and an output file. Note that we follow the Python convention - line counting starts from zero and the intervals contain the starting line but not the ending one.
Note that one can keep the whitespace characters (including newlines)
in the text by adding the -w, --include_whitespace
option
mlt file -l 2 -w text.txt
second
third
ls¶
List available language models. One can use them with sample.
Please check the official github to download pretrained models - mltype github.
Note
mlt ls
simply lists all the files present
in ~.mltype/languages
.
random¶
Generate random sequence of characters based on provided counts. The absolute counts are converted to relative counts (probability distribution) that we sample from.
Note
mlt random
samples characters independently unlike
mlt sample
which conditions on previous characters.
Arguments¶
CHARACTERS
- Characters to include in the vocabulary. The higher the number of occurances of a given character the higher the probabilty of this character being sampled.
Options¶
-f, --force-perfect
- All characters need to be typed correctly-i, --instant-death
- End game after the first mistake-n, --n-chars
INTEGER
- Number of characters to sample-o, --output-file
PATH
- Path to where to save the result file-t, --target-wpm
INTEGER
- The desired speed to be shown as a guide
Examples¶
Let’s say we want to practise typing of digits. However, we would like to spend more time on 5’s and 6’s since they are harder.
mlt random "123455556666789 "
This would give us something like this.
546261561 3566 53 5496 556659554 435 1386559569 5 85641553465118589
We see that the most frequent characters are 5’s, 6’s and spaces.
raw¶
Provide text manually.
Arguments¶
TEXT
- Text to be transfered to the typing interface
Options¶
-f, --force-perfect
- All characters need to be typed correctly-i, --instant-death
- End game after the first mistake-o, --output-file
PATH
- Path to where to save the result file-r, --raw-string
- If active, then newlines and tabs are not seen as special characters-t, --target-wpm
INTEGER
- The desired speed to be shown as a guide
Examples¶
Let’s say we have some text in the clipboard that we just paste and type. Additionally, we want to see the 80 word per minute (WPM) marker. Lastly, no errors are acceptable—instant death mode.
mlt raw -i -t 80 "Hello world I will write you quickly"
Hello world I will write you quickly
replay¶
Play against a past performance. To save a past
performance one can use the option -o, --output_file
of the following
commands
Arguments¶
REPLAY_FILE
- Past performance to play against
Options¶
-f, --force-perfect
- All characters need to be typed correctly-i, --instant-death
- End game after the first mistake-t, --target-wpm
INTEGER
- The desired speed to be shown as a guide-w, --overwrite
PATH
- Overwrite in place if faster
Examples¶
We ran mlt sample ... -o replay_file
and we are not particularly happy
about the performance. We would like to replay the same text and try to
improve our speed. In case we do, we would like the replay_file
to be
updated automatically (using the -w, --overwrite
option).
mlt replay -w replay_file
Some text we already typed before.
sample¶
Generate text using a character-level language model.
Note
As opposed to mlt random
, the mlt sample
command
is taking into consideration all the previous characters and
therefore could generate more realistic text.
To see all the available models use ls. Please check the official github to download pretrained models - mltype github.
Arguments¶
MODEL_NAME
- Name of the language model
Options¶
-f, --force-perfect
- All characters need to be typed correctly-i, --instant-death
- End game after the first mistake-k, --top-k
INTEGER
- Consider only the top k most probable characters-n, --n-chars
INTEGER
- Number of characters to generate-o, --output-file
PATH
- Path to where to save the result file-r, --random-state
INTEGER
- Random state for reproducible results-s, --starting-text
TEXT
- Initial text used as a starting condition-t, --target-wpm
INTEGER
- The desired speed to be shown as a guide-v, --verbose
Show progressbar when generating text
Examples¶
We want to practise typing Python without having to worry about having real
source code. Assuming we have a decent language model for Python (see
train) called amazing_python_model
then we can do the following
mlt sample amazing_python_model
spatial_median(X, method="lar", call='Log', Cov']) glm.fit(X, y) assert_all
close(ref_no_encoded_c
Maybe we would like to give the model some initial text and let it complete it for us.
mlt sample -s "@pytest.mark.parametrize" amazing_python_model
@pytest.mark.parametrize('solver', ['sparse_cg', 'sag', 'saga'])
@pytest.mark.parametrize('copy_X', ['not a number', -0.10]]
train¶
Train a character-level language model. The trained model can then be used with sample.
In the background, we use an LSTM and feedforward network architecture to achieve this task. The user can set most of the important hyperparameters via the CLI options. Note that one can train without a GPU, however, to get access to bigger networks and faster training (~minutes/hours) GPUs are recommended.
Arguments¶
PATH_1
,PATH_2
, … - Paths to files or folders containing text to be trained onMODEL_NAME
- Name of the trained model
Options¶
-b, --batch-size
INTEGER
- Number of samples in a batch-c, --checkpoint-path
PATH
- Load a checkpoiont and continue training it-d, --dense-size
INTEGER
- Size of the dense layer-e, --extensions
TEXT
- Comma-separated list of allowed extensions-f, --fill-strategy
TEXT
- Either zeros or skip. Determines how to deal with out of vocabulary characters-g, --gpus
INTEGER
- Number of gpus. In not specified, then none. If -1, then all.-h, --hidden_size
INTEGER
- Size of the hidden state-i, --illegal-chars
TEXT
- Characters to exclude from the vocabulary.-l, --n-layers
:code`INTEGER` - Number of layesr in the recurrent network-m, --use-mlflow
- Use MLFlow for logging-n, --max-epochs
INTEGER
- Maximum number of epochs-o, --output-path
PATH
- Custom path where to save the trained models and logging details. If not provided it defaults to ~/.mltype.-s, --early-stopping
- Enable early stopping based on validation loss-t, --train-test-split
FLOAT
- Train test split - value between (0, 1)-v, --vocab-size
INTEGER
- Number of the most frequent characters to include in the vocabulary-w, --window-size
INTEGER
- Number of previous characters to consider for prediction
Examples¶
Let’s assume we have a book in fulltext saved in the book.txt
file. Our
goal would be to train a model that learns the language used in this book
and is able to sample new pieces of text that resemble the original.
See below a list of hyperparameters that work reasonably well and the training can be done in a few hours (on a GPU)
--batch-size
128--dense-size
1024--early-stopping
--gpus
1--hidden-size
512--max-epochs
10--n-layers
3--vocab-size
70--window-size
100
So overall the commands looks like
mlt train book.txt cool_model -n 3 -s -g 1 -b 128 -l 3 -h 512 -d 1024 -w 100 -v 80
During the training, one can see progress bars and the training and
validation loss (using pytorch-lightning
in the background).
Once the training is done, the best model (based the validation loss)
will be stored in ~/.mltype/languages/cool_model
.
There are several important customizatons that one should be aware of.
Using MLflow
If one wants to get more training progress information theere is a flag
--use-mlflow
(requiring mlflow
being installed). To launch
the ui run the following commands
cd ~/.mltype/logs
mlflow ui
Multiple files
mlt train
supports training from multiple files and folders.
This is really useful if we want to recursively create a training
set of all files in a given folder (e.g. github repository). Additionally,
one can use the --extensions
to control what files are considered
when traversing a folder.
mlt train main.py folder_with_a_lot_of_files model --extensions ".py"
The above command will create a training set out of all files inside
of the folder_with_a_lot_of_files
folder having the
“.py” suffix and also the main.py.
Excluding undesirable characters
If the input files contain some characters that we do not want the model
to have in its vocabulary, we can simply use the --illegal-chars
option. Internally, when an out of vocabulary character is encounter, there
are two strategies to handle this (controled via --fill-strategy
)
zeros - vector of zeros is used
skip - only consider samples that do not have out of vocabulary characters anywhere in their window
mlt train book.txt cool_model --illegal-chars "~{}`[]"