* Improve interactive mode's coherence after EOS
Aims to improve coherence and ability to resume the interactive session when the user is given input back after an end of text token is reached.
Not sure what token 13 is or why it seems to help. See conversation for examples.
* Make newline token a constant
* dynamically determine newline token
* relocate previous newline token const
* cleanup whitespace
* print a new line on end of text in interactive
this may need to be looked into further when not using a reverse prompt
* only print manual newline with reverse prompt
fix formatting of reverse prompts so they don't end up at the end of the current line while not introducing unnecessary new lines otherwise
* alternate approach to replace end of text tokens
* Inject the reverse prompt again after eos in interactive mode
* tokenize reverse prompt when needed
makes this PR compatible with https://github.com/ggerganov/llama.cpp/pull/330
* tokenize and inject only first reverse prompt
thanks to tjohnman
* tokenize first reverse prompt once
* add newline token
* add newline token
* tokenize/inject reverse prompt for refactor
this doesn't seem right though
* tokenize nothing for antiprompt if no reverse
* Update main.cpp
* Update main.cpp
* tokenize and inject reverse prompt as needed
this doesn't seem to work if the reverse prompt is tokenized outside earlier on
* not needed
* remove newline token
* remove newline token
* tokenize newline token
* add space to comment
* Update main.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Slaren <2141330+slaren@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Revert "Delete SHA256SUMS for now (#416)"
This reverts commit 8eea5ae0e5.
* Remove ggml files until they can be verified
* Remove alpaca json
* Add also model/tokenizer.model to SHA256SUMS + update README
---------
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
* Update custom.md
* Removed Model section as it is better placed in README.md
* Updates to README.md model section
* Inserted text that was removed from issue template about obtaining models from FB and links to papers describing the various models
* Removed IPF down links for the Alpaca 7B models as these look to be in the old data format and probably shouldn't be directly linked to, anyway
* Updated the perplexity section to point at Perplexity scores #406 discussion
* Deduplicate q4 quantization functions
* Use const; add basic test
* Re-enable quantization test
* Disable AVX2 flags in CI
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Don't force immediate interactive without -i
Sometimes we might want to use a reverse prompt but we want to let the
model generate tokens right after the initial prompt. So we don't force
user input mode if the -i flag wasn't specified and instead let it run
until we encounter the reverse prompt.
This gives use some more flexibility, since it doesn't force the user to
enter a newline if they want to let the model generate text right after
the initial prompt and only be asked for input if the reverse prompt is
encountered.
The `--interactive-first` flag is reintroduced to force the old
behavior. `-r` behaves like `-i` plus introduces a reverse prompt (it
can be specified more than once).
* Update help output.
---------
Co-authored-by: Johnman <tjohnman@github>
* Major refactoring - introduce C-style API
* Clean up
* Add <cassert>
* Add <iterator>
* Add <algorithm> ....
* Fix timing reporting and accumulation
* Measure eval time only for single-token calls
* Change llama_tokenize return meaning
about: Used to report user-related issues with the software
about: Used to report issues and request enhancements for llama.cpp
title: "[User] I encountered a problem .."
title: "[User] Insert summary of your issue or enhancement.."
labels: ''
labels: ''
assignees: ''
assignees: ''
@ -18,11 +18,11 @@ Please answer the following questions for yourself before submitting an issue.
# Expected Behavior
# Expected Behavior
Please provide a detailed written description of what you were trying to do, and what you expected `lamma.cpp` to do.
Please provide a detailed written description of what you were trying to do, and what you expected `llama.cpp` to do.
# Current Behavior
# Current Behavior
Please provide a detailed written description of what `lamma.cpp` did, instead.
Please provide a detailed written description of what `llama.cpp` did, instead.
# Environment and Context
# Environment and Context
@ -44,20 +44,6 @@ $ make --version
$ g++ --version
$ g++ --version
```
```
# Models
* The LLaMA models are officially distributed by Facebook and will never be provided through this repository. See this [pull request in Facebook's LLaMA repository](https://github.com/facebookresearch/llama/pull/73/files) if you need to obtain access to the model data.
* If your issue is with model conversion please verify the `sha256sum` of each of your `consolidated*.pth` and `ggml-model-XXX.bin` files to confirm that you have the correct model data files before logging an issue. [Latest sha256 sums for your reference](https://github.com/ggerganov/llama.cpp/issues/238).
* If your issue is with model generation quality then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
* LLaMA:
* [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
* [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
* GPT-3
* [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
* GPT-3.5 / InstructGPT / ChatGPT:
* [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
* [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
# Failure Information (for bugs)
# Failure Information (for bugs)
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
@ -75,8 +61,9 @@ Please provide detailed steps for reproducing the issue. We are not sitting in f
Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.
Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.
Also, please try to **avoid using screenshots** if at all possible. Instead, copy/paste the console output and use [Github's markdown](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) to cleanly format your logs for easy readability. e.g.
Also, please try to **avoid using screenshots** if at all possible. Instead, copy/paste the console output and use [Github's markdown](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) to cleanly format your logs for easy readability.
### Obtaining and verifying the Facebook LLaMA original model and Stanford Alpaca model data
* The LLaMA models are officially distributed by Facebook and will never be provided through this repository. See this [pull request in Facebook's LLaMA repository](https://github.com/facebookresearch/llama/pull/73/files) if you need to obtain access to the model data.
* Please verify the sha256 checksums of all downloaded model files to confirm that you have the correct model data files before creating an issue relating to your model files.
* The following command will verify if you have all possible latest files in your self-installed `./models` subdirectory:
`sha256sum --ignore-missing -c SHA256SUMS` on Linux
or
`shasum -a 256 --ignore-missing -c SHA256SUMS` on macOS
* If your issue is with model generation quality then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
* LLaMA:
* [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
* [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
* GPT-3
* [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
* GPT-3.5 / InstructGPT / ChatGPT:
* [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
* [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
### Perplexity (Measuring model quality)
You can pass `--perplexity` as a command line option to measure perplexity over the given prompt. For more background,
see https://huggingface.co/docs/transformers/perplexity. However, in general, lower perplexity is better for LLMs.
#### Latest measurements
The latest perplexity scores for the various model sizes and quantizations are being tracked in [discussion #406](https://github.com/ggerganov/llama.cpp/discussions/406). `llama.cpp` is measuring very well
compared to the baseline implementations. Quantization has a small negative impact to quality, but, as you can see, running
13B at q4_0 beats the 7B f16 model by a significant amount.
All measurements are done against wikitext2 test dataset (https://paperswithcode.com/dataset/wikitext-2), with default options (512 length context).
Note that the changing the context length will have a significant impact on perplexity (longer context = better perplexity).
- There are no strict rules for the code style, but try to follow the patterns in the code (indentation, spaces, etc.). Vertical alignment makes things more readable and easier to batch edit
- There are no strict rules for the code style, but try to follow the patterns in the code (indentation, spaces, etc.). Vertical alignment makes things more readable and easier to batch edit
- Clean-up any trailing whitespaces, use 4 spaces indentation, brackets on same line, `void * ptr`, `int & a`
- Clean-up any trailing whitespaces, use 4 spaces indentation, brackets on same line, `void * ptr`, `int & a`
- See [good first issues](https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) for tasks suitable for first contributions
- See [good first issues](https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) for tasks suitable for first contributions