Checkpoints capture the exact value of all parameters used by a model. Checkpoints do not contain any description of the computation defined by the model and thus are typically only useful, when source code uses available saved parameter values.
1Note: Please read about AutoTokenizer example in this earlier week-1 blogpost [Hugging Face + FastAI - Session 1 - Ravi Chandra Veeramachaneni](https://ravichandraveeramachaneni.github.io/posts/bp7/) * To create a model with a pertained checkpoint we can use the below code snippet ``` model = AutoModel.from_pretrained(checkpoint) ``` * The Automodelโs output will be a feature vector for each of those tokens called โhidden states / featuresโ The Features are simply what model has learned from that input tokens. * These features are now passed to the specific part of the model called โheadโ and this head would be different based on tasks like Summarization, Text-generation etc.Copied!
Softmax function takes the predictions from model as input and outputs the same as probabilities between 0 and 1 and they all sum upto one.
Cross-Entropy loss is use of negative loss on probabilities. (Or in simple terms) Cross-Entropy loss is a combination of using the negative log likelihood on the log values of the probabilities from the softmax function.Read more about SoftMax function, cross-entropy loss in detail here in this blogpost: Deep Learning for Coders / Chapter-5 / Week-6 - Ravi Chandra Veeramachaneniโ
Please check this github gist where I tried learning all the steps above as show cased in the session: fastai+HF_week2_transformers_example.ipynb ยท GitHubโ
AutoTokenizer
and it utilizes the from_pretrained
method to do that.Please check this github gist where I have tried creating a Tokenizer from scratch: fastai+HF_week2_Tokenizer_from_scratch.ipynb ยท GitHubโ
AutoModel
which โcan automatically guess the appropriate model architecture for your checkpoint, and then instantiates a model with this architecture.โfrom_pretrained
method that is used to take the checkpoint and output the model.save_pretrained
method. This method will output two files a config.json which has metadata of transformers version, checkpoint information etc. and a pytorch_model.bin file which contains our model weights.truncate
and also we can specify the max_length
to limit the sequence length.Attention layers of the transformers model contextualize each token.