AI Starter
  • Overview
    • Introduction
    • Problems and AI Starter Solutions
    • Missions and Visions
    • Foundations
      • Page
      • Large Language Models (LLMs)
      • Text to Image Models (TTIMs)
      • Natural Language Processing (NLP)
      • Machine Learning (ML)
      • Fine-Tuning
      • Generative Model
      • Tokenization
      • Contextual Awareness
      • APIs & SDKs
    • Liquidity Bootstrapping Pool (LBP)
      • What is an LBP?
      • How Does it Work?
      • What's Special About it?
      • Why do we using it?
    • Roadmap
      • Phase 1: Foundation
      • Phase 2: Platform Development
      • Phase 3: Launch and Initial Adoption
      • Phase 4: Expansion and Ecosystem Growth
      • Phase 5: Maturity and Sustainability
  • ECOSYSTEM
    • Starter Launchpad
      • Overview
      • Multi-chain Swap
      • Flexible Refund Policy
      • Staking Pool
      • Tier System
      • Ambassadors Program
    • Liquidity Bootstrapping Pool (LBP)
      • For LBP participants
        • How to participate in an LBP
        • Curated vs Unvetted Lists
        • Token & LP Lock
        • LBP Participant Tips
      • For LBP Creators
        • How to Create a LBP
        • Manage The Pool
        • Curated LBPs
        • Unvetted LBPs
        • FAQs
    • AI Tools & Applications
      • AI Starter Chatbot
      • AI Model Training and Deployment
      • Data Analysis and Visualization
      • Natural Language Processing (NLP) Tools
      • AI Trading Assistant
      • AI Smart-Contract Generator
      • AI Starter Marketplace
      • AI Cross-Chain Swap
    • APIs & SDKs
      • AI News SDK Documentation
        • Getting Started
        • SDK Components
      • ChatBot SDK Documentation
        • Getting started
        • SDK Components
      • Smart Contract Generator SDK Documentation
        • Getting started
        • SDK Components
    • DAO Governance
    • LLMs & TTIMs
    • STAR-T NFTs
      • Distribution
  • Tokenomics
    • $AIS Information
    • $AIS Allocations
    • $AIS Utility
  • Official Docs and Links
    • Official Media
      • Page 1
    • Legal Docs
      • Disclaimer
      • Privacy Policy
      • Terms of Service
      • Cookies Policy
Powered by GitBook
On this page
  1. Overview
  2. Foundations

Tokenization

Within the realm of artificial intelligence, tokenization refers to the transformation of information into portions/bits of recurrent data.

Known as Byte-pair encoding, tokenization is basically breaking down text strings into micro-batches of characters and tagging them so that they can be adeptly stored and understood by the binary functions of a computer.

Consider the combination of the three letters i-n-g. Each letter is an independent token; however, when put together to form “ing,” they take on a wholly different representation that is commonly found at the tail end of past tense actions (end-ing, mean-ing, vot-ing, etc.).

Taking this a step further, within the three characters, you will also see the ability for a multitude of two-letter combinations for tags such as “ig,” “ng,” “gi,” and so on. Each combination of the letters would be its own unique token. By tokenizing every possible combination of the individual letters, it becomes easier to recognize patterns in large datasets rather than processing each letter individually. For situations where “in” would arise, we know that if the two letters are detached from any other letters, then it would classify as a word; however, if they are included with other letters, then we can ignore the word take and operate on the sentence structure more accurately.

PreviousGenerative ModelNextContextual Awareness

Last updated 11 months ago