Command Line Interface¶

The command line interface (CLI) is the primary way of using mltype. After installation, one can use the entrypoint mlt that is going to be in the path.

$ mlt
Usage: mlt [OPTIONS] COMMAND [ARGS]...

  Tool for improving typing speed and accuracy

Options:
  --help  Show this message and exit.

Commands:
  file    Type text from a file.
  ls      List all language models
  random  Sample characters randomly from a provided vocabulary
  raw     Provide text manually
  replay  Compete against a past performance
  sample  Sample text from a language
  train   Train a language

Note that mltype uses the folder ~/.mltype (in the home directory) for storing all relevant data. See below the usual structure.

- .mltype/
   - checkpoints/
       - a/  # training checkpoints of model a
       - b/  # training checkpoints of model b
   - languages/
       - a  # some model
       - b  # some other model
       ...
   - logs/
      ..

file¶

Type random (or fixed) lines from a text file. This command has two main modes:

Random lines - Select random consecutive lines. One needs to specify --n-lines and optionally the random-state (for reproducibility).
Fixed lines - One needs to specify --start-line and --end-line.

Arguments¶

PATH - Path to the text file to read from

Options¶

-e, --end-line INTEGER - The end line of the excerpt to use. Needs to be used together with start-line.
-f, --force-perfect - All characters need to be typed correctly
-i, --instant-death - End game after the first mistake
-l, --n-lines INTEGER - Number of consecutive lines to be selected at random. Cannot be used together with start-line and end-line.
-o, --output-file PATH - Path to where to save the result file
-r, --random-state INTEGER
-s, --start-line INTEGER - the start line of the excerpt to use. needs to be used together with end-line.
-t, --target-wpm INTEGER - The desired speed to be shown as a guide
-w, --include-whitespace - Include whitespace characters.

Examples¶

Let us first create a text file

echo $'zeroth\nfirst\nsecond\nthird\nfourth\nfifth\nsixth' > text.txt
cat text.txt

zeroth
first
second
third
fourth
fifth
sixth

To select contiguous lines randomly, one can to specify -l, --n_lines representing the number of lines to use.

mlt file -l 2 text.txt

Which would open the typing interface with 2 random contiguous lines

second third

The other option would be to use the deterministic mode and select the starting and ending line manually

mlt file -s 0 -e 3 text.txt

zeroth first second

As multiple commands, one can specify a target speed and an output file. Note that we follow the Python convention - line counting starts from zero and the intervals contain the starting line but not the ending one.

Note that one can keep the whitespace characters (including newlines) in the text by adding the -w, --include_whitespace option

mlt file -l 2 -w text.txt

second
third

ls¶

List available language models. One can use them with sample.

Please check the official github to download pretrained models - mltype github.

Note

mlt ls simply lists all the files present in ~.mltype/languages.

Examples¶

mlt ls

python
some_amazing_model
wikipedia

random¶

Generate random sequence of characters based on provided counts. The absolute counts are converted to relative counts (probability distribution) that we sample from.

Note

mlt random samples characters independently unlike mlt sample which conditions on previous characters.

Arguments¶

CHARACTERS - Characters to include in the vocabulary. The higher the number of occurances of a given character the higher the probabilty of this character being sampled.

Options¶

-f, --force-perfect - All characters need to be typed correctly
-i, --instant-death - End game after the first mistake
-n, --n-chars INTEGER - Number of characters to sample
-o, --output-file PATH - Path to where to save the result file
-t, --target-wpm INTEGER - The desired speed to be shown as a guide

Examples¶

Let’s say we want to practise typing of digits. However, we would like to spend more time on 5’s and 6’s since they are harder.

mlt random "123455556666789    "

This would give us something like this.

546261561 3566  53 5496 556659554 435 1386559569  5 85641553465118589

We see that the most frequent characters are 5’s, 6’s and spaces.

raw¶

Provide text manually.

Arguments¶

TEXT - Text to be transfered to the typing interface

Options¶

-f, --force-perfect - All characters need to be typed correctly
-i, --instant-death - End game after the first mistake
-o, --output-file PATH - Path to where to save the result file
-r, --raw-string - If active, then newlines and tabs are not seen as special characters
-t, --target-wpm INTEGER - The desired speed to be shown as a guide

Examples¶

Let’s say we have some text in the clipboard that we just paste and type. Additionally, we want to see the 80 word per minute (WPM) marker. Lastly, no errors are acceptable—instant death mode.

mlt raw -i -t 80 "Hello world I will write you quickly"

Hello world I will write you quickly

replay¶

Play against a past performance. To save a past performance one can use the option -o, --output_file of the following commands

file
random
raw
sample

Arguments¶

REPLAY_FILE - Past performance to play against

Options¶

-f, --force-perfect - All characters need to be typed correctly
-i, --instant-death - End game after the first mistake
-t, --target-wpm INTEGER - The desired speed to be shown as a guide
-w, --overwrite PATH - Overwrite in place if faster

Examples¶

We ran mlt sample ... -o replay_file and we are not particularly happy about the performance. We would like to replay the same text and try to improve our speed. In case we do, we would like the replay_file to be updated automatically (using the -w, --overwrite option).

mlt replay -w replay_file

Some text we already typed before.

sample¶

Generate text using a character-level language model.

Note

As opposed to mlt random, the mlt sample command is taking into consideration all the previous characters and therefore could generate more realistic text.

To see all the available models use ls. Please check the official github to download pretrained models - mltype github.

Arguments¶

MODEL_NAME - Name of the language model

Options¶

-f, --force-perfect - All characters need to be typed correctly
-i, --instant-death - End game after the first mistake
-k, --top-k INTEGER - Consider only the top k most probable characters
-n, --n-chars INTEGER - Number of characters to generate
-o, --output-file PATH - Path to where to save the result file
-r, --random-state INTEGER - Random state for reproducible results
-s, --starting-text TEXT - Initial text used as a starting condition
-t, --target-wpm INTEGER - The desired speed to be shown as a guide
-v, --verbose Show progressbar when generating text

Examples¶

We want to practise typing Python without having to worry about having real source code. Assuming we have a decent language model for Python (see train) called amazing_python_model then we can do the following

mlt sample amazing_python_model

spatial_median(X, method="lar", call='Log', Cov']) glm.fit(X, y) assert_all
close(ref_no_encoded_c

Maybe we would like to give the model some initial text and let it complete it for us.

mlt sample -s "@pytest.mark.parametrize" amazing_python_model

@pytest.mark.parametrize('solver', ['sparse_cg', 'sag', 'saga'])
@pytest.mark.parametrize('copy_X', ['not a number', -0.10]]

train¶

Train a character-level language model. The trained model can then be used with sample.

In the background, we use an LSTM and feedforward network architecture to achieve this task. The user can set most of the important hyperparameters via the CLI options. Note that one can train without a GPU, however, to get access to bigger networks and faster training (~minutes/hours) GPUs are recommended.

Arguments¶

PATH_1, PATH_2, … - Paths to files or folders containing text to be trained on
MODEL_NAME - Name of the trained model

Options¶

-b, --batch-size INTEGER - Number of samples in a batch
-c, --checkpoint-path PATH - Load a checkpoiont and continue training it
-d, --dense-size INTEGER - Size of the dense layer
-e, --extensions TEXT - Comma-separated list of allowed extensions
-f, --fill-strategy TEXT - Either zeros or skip. Determines how to deal with out of vocabulary characters
-g, --gpus INTEGER - Number of gpus. In not specified, then none. If -1, then all.
-h, --hidden_size INTEGER - Size of the hidden state
-i, --illegal-chars TEXT - Characters to exclude from the vocabulary.
-l, --n-layers :code`INTEGER` - Number of layesr in the recurrent network
-m, --use-mlflow - Use MLFlow for logging
-n, --max-epochs INTEGER - Maximum number of epochs
-o, --output-path PATH - Custom path where to save the trained models and logging details. If not provided it defaults to ~/.mltype.
-s, --early-stopping - Enable early stopping based on validation loss
-t, --train-test-split FLOAT - Train test split - value between (0, 1)
-v, --vocab-size INTEGER - Number of the most frequent characters to include in the vocabulary
-w, --window-size INTEGER - Number of previous characters to consider for prediction

Examples¶

Let’s assume we have a book in fulltext saved in the book.txt file. Our goal would be to train a model that learns the language used in this book and is able to sample new pieces of text that resemble the original.

See below a list of hyperparameters that work reasonably well and the training can be done in a few hours (on a GPU)

--batch-size 128
--dense-size 1024
--early-stopping
--gpus 1
--hidden-size 512
--max-epochs 10
--n-layers 3
--vocab-size 70
--window-size 100

So overall the commands looks like

mlt train book.txt cool_model -n 3 -s -g 1 -b 128 -l 3 -h 512 -d 1024 -w 100 -v 80

During the training, one can see progress bars and the training and validation loss (using pytorch-lightning in the background). Once the training is done, the best model (based the validation loss) will be stored in ~/.mltype/languages/cool_model.

There are several important customizatons that one should be aware of.

Using MLflow

If one wants to get more training progress information theere is a flag --use-mlflow (requiring mlflow being installed). To launch the ui run the following commands

cd ~/.mltype/logs
mlflow ui

Multiple files

mlt train supports training from multiple files and folders. This is really useful if we want to recursively create a training set of all files in a given folder (e.g. github repository). Additionally, one can use the --extensions to control what files are considered when traversing a folder.

mlt train main.py folder_with_a_lot_of_files model --extensions ".py"

The above command will create a training set out of all files inside of the folder_with_a_lot_of_files folder having the “.py” suffix and also the main.py.

Excluding undesirable characters

If the input files contain some characters that we do not want the model to have in its vocabulary, we can simply use the --illegal-chars option. Internally, when an out of vocabulary character is encounter, there are two strategies to handle this (controled via --fill-strategy)

zeros - vector of zeros is used
skip - only consider samples that do not have out of vocabulary characters anywhere in their window

mlt train book.txt cool_model --illegal-chars "~{}`[]"