本文以Llama2模型为例介绍大语言模型的原理、微调，以及国内优秀的大语言模型。

1. GPT

GPT相关的资料：

introduce the pre-training + fine-tuning methos to NLP
Pre-training
- Transformer decoder
Fine-tuning (linear+softmax layer)
- Textual entailment
- Similarity
- Question Answering
- Commonsense Reasoning

large and diverse dataset
- article or fiction books
- Common Crawl (mostly unintelligible)
  - filter this dataset using Reddit to ensure the used document received at least 3 karma
- WebText
Zero-shot or few-shot

align large language with human
- fine-tuning with human feedback
- sorted dataset using reinforcement learning
cost function is designed for predict next word in a sentence, cause the misalign between human need with model
three steps
- Key point
  - how to label SFT data
  - how to label sorted data
    - human come up with some promt and GPT playground
    - helpfulness for traing and truthfulness and harmfulness for evaluation
  - how to train RM
    - Pairwith ranking loss
      - K = 9 is effectiva not only for label cost but also for evaluation cost
  - how to fine-tune mode with RM
    - PPO-ptx
      - maximize the reward for SFT model while keep the original ablity