2020-03-01·post

Conditioning Transformers

A brief exploration into the idea of conditioning transformers to generate text conditioned on a given context

Conditioning Transformers

Providing more conditioning gives more control. This principle might be observed in neuron firing, desicion theory, and process control.

Go to Runtime → Run all (Ctrl.+F9) to experiment yourself. Have fun! 😊

Setup

Essentials but not the point of research

%tensorflow_version 2.x

!pip install git+https://github.com/huggingface/transformers

import tensorflow as tf
import transformers

import logging
logging.getLogger('transformers.tokenization_utils').setLevel(logging.ERROR)
logging.getLogger('transformers.tokenization_utils').disabled = True

tokenizer = transformers.GPT2Tokenizer.from_pretrained('gpt2')
transformer = transformers.TFGPT2LMHeadModel.from_pretrained('gpt2')

def speak(text):
    inp_tokens = tokenizer.encode(text, return_tensors='tf')
    out_tokens = transformer.generate(
        inp_tokens,
        max_length=50,
        top_k=50,
        top_p=0.7,
        do_sample=True,
        temperature=0.75,
        repetition_penalty=10)
    return tokenizer.decode(out_tokens[0])

Experiment

I predict that sentences will appear more similar as conditioning length increases.

def experiment(text):
    """condition gpt-2 generation on progressively longer starting sequences
    args:
        text: string language input
    returns:
        gpt-2 outputs
    """

    words = text.split(' ')
    for conditioning_len, _ in enumerate(words, start=1):
        conditioning_seq = ' '.join(words[0:conditioning_len])
        print('='*32+'\r',f'{conditioning_len}:', speak(conditioning_seq))

experiment('AI engineering demands a rigorous examination of every variable involved.')

experiment('Old McDonald had a farm')

Conclusions

Being a random machine, I cannot precisely predict the transformer's output. However, the next generated token has high mutual information with previous tokens. If I exercise more control over the previous tokens, then I exercise control ove the following tokens. This is control by narrowing options, as reasonable conclusions decrease in set size with increase priors. Finally, this is comparable to the behavior exhibited in intelligent beings. Although internal dynamics are usually unknown, their predicted behavior is further refined with time. Like gpt-2, longtime friends can even model each other's sentences!