[Tut] MPT-7B: A Free Open-Source Large Language Model (LLM)

[Tut] MPT-7B: A Free Open-Source Large Language Model (LLM) - Printable Version

+- Sick Gaming (https://www.sickgaming.net)
+-- Forum: Programming (https://www.sickgaming.net/forum-76.html)
+--- Forum: Python (https://www.sickgaming.net/forum-83.html)
+--- Thread: [Tut] MPT-7B: A Free Open-Source Large Language Model (LLM) (/thread-101157.html)

[Tut] MPT-7B: A Free Open-Source Large Language Model (LLM) - xSicKxBot - 05-19-2023

MPT-7B: A Free Open-Source Large Language Model (LLM)

<div>
<div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload='{"align":"left","id":"1370322","slug":"default","valign":"top","ignore":"","reference":"auto","class":"","count":"1","legendonly":"","readonly":"","score":"5","starsonly":"","best":"5","gap":"5","greet":"Rate this post","legend":"5\/5 - (1 vote)","size":"24","title":"MPT-7B: A Free Open-Source Large Language Model (LLM)","width":"142.5","_legend":"{score}\/{best} - ({count} {votes})","font_factor":"1.25"}'>
<div class="kksr-stars">
<div class="kksr-stars-inactive">
<div class="kksr-star" data-star="1" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="2" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="3" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="4" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="5" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
<div class="kksr-stars-active" style="width: 142.5px;">
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
</div>
<div class="kksr-legend" style="font-size: 19.2px;"> 5/5 – (1 vote) </div>
</p></div>
<p>MPT-7B is a <a rel="noreferrer noopener" href="https://blog.finxter.com/the-evolution-of-large-language-models-llms-insights-from-gpt-4-and-beyond/" data-type="post" data-id="1267220" target="_blank">large language model (LLM)</a> standard developed by <a rel="noreferrer noopener" href="https://www.mosaicml.com/blog/mpt-7b" data-type="URL" data-id="https://www.mosaicml.com/blog/mpt-7b" target="_blank">MosaicML</a>, for open-source, commercially usable LLMs and a groundbreaking innovation in natural language processing technology. </p>
<p>With nearly 7 billion parameters, MPT-7B offers impressive performance and has been trained on a diverse dataset of 1 trillion tokens, including text and code. As a part of the MosaicPretrainedTransformer (MPT) family, it utilizes a modified transformer architecture, optimized for efficient training and inference, setting a new standard for open-source, commercially usable language models.</p>
<p>MosaicML achieved an impressive feat by training MPT-7B on their platform in just 9.5 days, with zero human intervention, at a cost of around $200,000. This model not only offers unparalleled quality but also mirrors the performance of Meta’s LLaMA-7B while maintaining an open-source status, making it ideal for commercial use.</p>
<p>MPT-7B’s lineup includes various specialized models like MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+, each catering to different use cases. By offering powerful performance and extensive functionality, MPT-7B emerges as a leading contender in the global LLM landscape.</p>
<h2 class="wp-block-heading">MPT-7B Huggingface</h2>
<p>MPT-7B is a large language model developed by MosaicML and available on <a href="https://huggingface.co/mosaicml/mpt-7b" target="_blank" rel="noreferrer noopener">Hugging Face</a> for easy usage. It is designed for efficient training and inference, suitable for commercial use and outperforms other models in various benchmarks.</p>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="169" src="https://blog.finxter.com/wp-content/uploads/2023/05/image-246-1024x169.png" alt="" class="wp-image-1370400" srcset="https://blog.finxter.com/wp-content/uploads/2023/05/image-246-1024x169.png 1024w, https://blog.finxter.com/wp-content/uploads/2023/05/image-246-300x50.png 300w, https://blog.finxter.com/wp-content/uploads/2023/05/image-246-768x127.png 768w, https://blog.finxter.com/wp-content/uploads/2023/05/image-246.png 1531w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
</div>
<h2 class="wp-block-heading">LLM</h2>
<p>As a large language model (LLM), MPT-7B is trained from scratch on <a href="https://www.mosaicml.com/blog/mpt-7b" target="_blank" rel="noreferrer noopener">1T tokens</a> of text and code. It utilizes a modified transformer architecture for better efficiency and matches the quality of other LLMs while being open-source.</p>
<h2 class="wp-block-heading">Comparison to Other LLMs</h2>
<p>The MPT-7B is an impressive language learning model (LLM) that demonstrates performance comparable to the LLaMA-7B model, and even outpaces other open-source models ranging from 7B to 20B parameters in terms of standard academic tasks. (<a href="https://www.mosaicml.com/blog/mpt-7b" data-type="URL" data-id="https://www.mosaicml.com/blog/mpt-7b" target="_blank" rel="noreferrer noopener">source</a>)</p>
<figure class="wp-block-image size-large"><img decoding="async" loading="lazy" width="1024" height="642" src="https://blog.finxter.com/wp-content/uploads/2023/05/image-249-1024x642.png" alt="" class="wp-image-1370425" srcset="https://blog.finxter.com/wp-content/uploads/2023/05/image-249-1024x642.png 1024w, https://blog.finxter.com/wp-content/uploads/2023/05/image-249-300x188.png 300w, https://blog.finxter.com/wp-content/uploads/2023/05/image-249-768x482.png 768w, https://blog.finxter.com/wp-content/uploads/2023/05/image-249.png 1231w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
<p>Quality evaluations involving a compilation of 11 open-source benchmarks commonly used for in-context learning (ICL), in addition to a self-curated Jeopardy benchmark to test factual accuracy in responses, demonstrate the robust performance of MPT-7B. </p>
<p>Remarkably, zero-shot accuracy comparisons between MPT-7B, LLaMA-7B, and other open-source models revealed that MPT-7B and LLaMA-7B share a similar level of quality across all tasks, with each model earning the highest scores on 6 out of the 12 tasks. </p>
<p>Despite their comparable performance, MPT-7B and LLaMA-7B noticeably surpass other open-source language models, including those with substantially larger parameter counts. </p>
<p>These results, made possible through the MosaicML LLM Foundry’s ICL evaluation framework, are of particular importance as they were achieved under fair and consistent conditions without the use of prompt strings or prompt tuning. </p>
<p>Furthermore, this evaluation suite brings with it an invitation to the community to engage in model evaluations and contribute additional datasets and ICL task types for continued advancements in the evaluation process.</p>
<p>I also find a nice video on the model, check it out right here:</p>
<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube"><a href="https://blog.finxter.com/mpt-7b-llm-quick-guide/"><img src="https://blog.finxter.com/wp-content/plugins/wp-youtube-lyte/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FPnMkZGf-ZYk%2Fhqdefault.jpg" alt="YouTube Video"></a><figcaption></figcaption></figure>
<h2 class="wp-block-heading">Commercial Use and Licences</h2>
<figure class="wp-block-image size-large"><img decoding="async" loading="lazy" width="712" height="1024" src="https://blog.finxter.com/wp-content/uploads/2023/05/image-247-712x1024.png" alt="" class="wp-image-1370406" srcset="https://blog.finxter.com/wp-content/uploads/2023/05/image-247-712x1024.png 712w, https://blog.finxter.com/wp-content/uploads/2023/05/image-247-209x300.png 209w, https://blog.finxter.com/wp-content/uploads/2023/05/image-247.png 757w" sizes="(max-width: 712px) 100vw, 712px" /></figure>
<p>MPT-7B is released under the <strong>Apache 2.0</strong>, <strong>CC-By-SA-3.0</strong>, and <strong>CC-By-SA-4.0</strong> licenses on <a rel="noreferrer noopener" href="https://huggingface.co/mosaicml/mpt-7b" data-type="URL" data-id="https://huggingface.co/mosaicml/mpt-7b" target="_blank">Huggingface</a>, not GitHub, to my knowledge, making it usable for commercial applications without any restrictions. </p>
<ol>
<li><strong>Apache 2.0</strong>: It is an open-source software license that permits users to freely use, modify, and distribute the licensed work, while also providing explicit grant of patent rights from contributors to users.</li>
<li><strong>CC-BY-SA-3.0</strong>: Creative Commons Attribution-ShareAlike 3.0 is a license that allows for free distribution, remixing, tweaking, and building upon a work, even commercially, as long as the new creation is credited and licensed under the identical terms.</li>
<li><strong>CC-BY-SA-4.0</strong>: This is an updated version of the Creative Commons Attribution-ShareAlike license that similarly allows anyone to remix, adapt, and build upon a work, even for commercial purposes, provided that they credit the original creation and license their new creations under the identical terms, but with a few enhancements in terms of internationalization and adaptability to new technologies compared to its predecessor.</li>
</ol>
<h2 class="wp-block-heading">Chat</h2>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img decoding="async" loading="lazy" width="1024" height="697" src="https://blog.finxter.com/wp-content/uploads/2023/05/image-248-1024x697.png" alt="" class="wp-image-1370423" srcset="https://blog.finxter.com/wp-content/uploads/2023/05/image-248-1024x697.png 1024w, https://blog.finxter.com/wp-content/uploads/2023/05/image-248-300x204.png 300w, https://blog.finxter.com/wp-content/uploads/2023/05/image-248-768x523.png 768w, https://blog.finxter.com/wp-content/uploads/2023/05/image-248.png 1194w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
</div>
<p class="has-global-color-8-background-color has-background">The MPT-7B model has a specific version called <a rel="noreferrer noopener" href="https://sapling.ai/llm/llama-vs-mpt" target="_blank">MPT-7B-Chat</a> that is designed for conversational use cases, making it a great option for building chatbots and virtual assistants.</p>
<p>Here’s another sample chat from the <a href="https://www.mosaicml.com/blog/mpt-7b" data-type="URL" data-id="https://www.mosaicml.com/blog/mpt-7b" target="_blank" rel="noreferrer noopener">original website</a>:</p>
<figure class="wp-block-image size-large"><img decoding="async" loading="lazy" width="724" height="1024" src="https://blog.finxter.com/wp-content/uploads/2023/05/image-253-724x1024.png" alt="" class="wp-image-1370464" srcset="https://blog.finxter.com/wp-content/uploads/2023/05/image-253-724x1024.png 724w, https://blog.finxter.com/wp-content/uploads/2023/05/image-253-212x300.png 212w, https://blog.finxter.com/wp-content/uploads/2023/05/image-253-768x1086.png 768w, https://blog.finxter.com/wp-content/uploads/2023/05/image-253-1087x1536.png 1087w, https://blog.finxter.com/wp-content/uploads/2023/05/image-253.png 1228w" sizes="(max-width: 724px) 100vw, 724px" /></figure>
<h2 class="wp-block-heading">Storywriter 65K</h2>
<p>I was always frustrated with ChatGPTs length limitations. Storywriter 65k is a nice open-source solution to it! <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f973.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<figure class="wp-block-image size-large"><img decoding="async" loading="lazy" width="1024" height="604" src="https://blog.finxter.com/wp-content/uploads/2023/05/image-250-1024x604.png" alt="" class="wp-image-1370455" srcset="https://blog.finxter.com/wp-content/uploads/2023/05/image-250-1024x604.png 1024w, https://blog.finxter.com/wp-content/uploads/2023/05/image-250-300x177.png 300w, https://blog.finxter.com/wp-content/uploads/2023/05/image-250-768x453.png 768w, https://blog.finxter.com/wp-content/uploads/2023/05/image-250.png 1221w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
<p class="has-global-color-8-background-color has-background">MPT-7B has a <a rel="noreferrer noopener" href="https://sapling.ai/llm/llama-vs-mpt" target="_blank">StoryWriter variant</a> that focuses on generating coherent and engaging stories. This StoryWriter version is an excellent choice for content generation tasks. The <a href="https://sapling.ai/llm/llama-vs-mpt">MPT-7B-StoryWriter-65k+ version</a> is designed to handle even longer stories, suitable for applications requiring extended narrative output.</p>
<p>Here’s an example prompt (<a href="https://www.mosaicml.com/blog/mpt-7b" data-type="URL" data-id="https://www.mosaicml.com/blog/mpt-7b" target="_blank" rel="noreferrer noopener">source</a>):</p>
<figure class="wp-block-image size-large"><img decoding="async" loading="lazy" width="1024" height="1020" src="https://blog.finxter.com/wp-content/uploads/2023/05/image-251-1024x1020.png" alt="" class="wp-image-1370458" srcset="https://blog.finxter.com/wp-content/uploads/2023/05/image-251-1024x1020.png 1024w, https://blog.finxter.com/wp-content/uploads/2023/05/image-251-300x300.png 300w, https://blog.finxter.com/wp-content/uploads/2023/05/image-251-150x150.png 150w, https://blog.finxter.com/wp-content/uploads/2023/05/image-251-768x765.png 768w, https://blog.finxter.com/wp-content/uploads/2023/05/image-251.png 1134w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
<h2 class="wp-block-heading">MPT-7B-Instruct</h2>
<p><a href="https://sapling.ai/llm/llama-vs-mpt" target="_blank" rel="noreferrer noopener">The Instruct version</a> of MPT-7B is optimized for providing detailed instructions and guidance based on user input, making it a perfect fit for instructional applications and virtual learning.</p>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img decoding="async" loading="lazy" width="1024" height="633" src="https://blog.finxter.com/wp-content/uploads/2023/05/image-252-1024x633.png" alt="" class="wp-image-1370463" srcset="https://blog.finxter.com/wp-content/uploads/2023/05/image-252-1024x633.png 1024w, https://blog.finxter.com/wp-content/uploads/2023/05/image-252-300x185.png 300w, https://blog.finxter.com/wp-content/uploads/2023/05/image-252-768x475.png 768w, https://blog.finxter.com/wp-content/uploads/2023/05/image-252.png 1061w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
</div>
<h2 class="wp-block-heading">Context Length</h2>
<p>MPT-7B large language models are designed to handle varying context lengths depending on the use case. Longer context lengths allow for better understanding and more accurate responses in conversational scenarios.</p>
<h2 class="wp-block-heading">Tokens, Meta, and Datasets</h2>
<p>MPT-7B utilizes 1T tokens in various data sources such as the <a rel="noreferrer noopener" href="https://sapling.ai/llm/bloom-vs-mpt" target="_blank">Books3 dataset</a> created by EleutherAI and the Evol-Instruct dataset.</p>
<p>Meta-information about MPT-7B, such as its architecture and training methodology, can be found in the <a href="https://www.mosaicml.com/blog/mpt-7b" data-type="URL" data-id="https://www.mosaicml.com/blog/mpt-7b" target="_blank" rel="noreferrer noopener">documentation</a>.</p>
<p>Datasets used for training MPT-7B include Books3, Alpaca, and Evol-Instruct, which cover different types of text content to create a diverse language model. </p>
<p>(<a href="https://www.mosaicml.com/blog/mosaicml-streamingdataset" data-type="URL" data-id="https://www.mosaicml.com/blog/mosaicml-streamingdataset" target="_blank" rel="noreferrer noopener">source</a>)</p>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="1198" height="630" src="https://blog.finxter.com/wp-content/uploads/2023/05/63e44fd358aa51103cf46c27_image8.gif" alt="" class="wp-image-1370475"/></figure>
</div>
<p>You can check out their great GitHub repository <a rel="noreferrer noopener" href="https://github.com/mosaicml/streaming" data-type="URL" data-id="https://github.com/mosaicml/streaming" target="_blank">MosaicML Streaming</a> to train your LLMs easily from cloud storage (multi-node, distributed training for large models)!</p>
<h2 class="wp-block-heading">Access</h2>
<p>MPT-7B is easy to access through its <a href="https://huggingface.co/mosaicml/mpt-7b" target="_blank" rel="noreferrer noopener">Hugging Face implementation</a>, making it straightforward to deploy and integrate into various projects and applications.</p>
<h2 class="wp-block-heading">Benchmarks</h2>
<p>MPT-7B has been benchmarked against several other large language models and matches the performance of LLaMA, as shown above, while being open-source and commercially friendly.</p>
<p>Unfortunately, I didn’t find an independently-researched benchmark that was not provided by their creators MosaicML. More research is definitely needed! If you’re an ML researcher, why not fill this research gap?</p>
<h2 class="wp-block-heading">Databricks Dolly-15K, Sharegpt-Vicuna, HC3, Anthropic Helpful and Harmless Datasets</h2>
<p>MPT-7B is designed to work effectively with various language models and datasets such as Databricks Dolly-15K, Sharegpt-Vicuna, HC3, and Anthropic’s Helpful and Harmless datasets.</p>
<h2 class="wp-block-heading">Pricing</h2>
<p>While there is no direct pricing associated with MPT-7B, users may experience costs associated with infrastructure, compute resources, and deployment depending on their requirements.</p>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2665.png" alt="♥" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Thanks for reading the article! Feel free to join 100,000 coders in my <a rel="noreferrer noopener" href="https://blog.finxter.com/subscribe/" data-type="page" data-id="1414" target="_blank">free email newsletter on AI and exponential technologies</a> such as blockchain development and Python! </p>
<p>Also, you can download a fun cheat sheet here:</p>
<h2 class="wp-block-heading">OpenAI Glossary Cheat Sheet (100% Free PDF Download) <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f447.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></h2>
<p>Finally, check out our free cheat sheet on OpenAI terminology, many Finxters have told me they love it! <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2665.png" alt="♥" class="wp-smiley" style="height: 1em; max-height: 1em;" /> </p>
<div class="wp-block-image">
<figure class="aligncenter size-full"><a href="https://blog.finxter.com/openai-glossary/" target="_blank" rel="noreferrer noopener"><img decoding="async" loading="lazy" width="720" height="960" src="https://blog.finxter.com/wp-content/uploads/2023/04/Finxter_OpenAI_Glossary-1.jpg" alt="" class="wp-image-1278472" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/Finxter_OpenAI_Glossary-1.jpg 720w, https://blog.finxter.com/wp-content/uploads/2023/04/Finxter_OpenAI_Glossary-1-225x300.jpg 225w" sizes="(max-width: 720px) 100vw, 720px" /></a></figure>
</div>
<p class="has-base-2-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended</strong>: <a href="https://blog.finxter.com/openai-glossary/" data-type="post" data-id="1276420" target="_blank" rel="noreferrer noopener">OpenAI Terminology Cheat Sheet (Free Download PDF)</a></p>
</div>

https://www.sickgaming.net/blog/2023/05/18/mpt-7b-a-free-open-source-large-language-model-llm/