Tuesday, January 13, 2026
No Result
View All Result
The Crypto HODL
  • Home
  • Bitcoin
  • Crypto Updates
    • Altcoin
    • Ethereum
    • Crypto Updates
    • Crypto Mining
    • Crypto Exchanges
  • Blockchain
  • NFT
  • DeFi
  • Web3
  • Metaverse
  • Regulations
  • Scam Alert
  • Analysis
  • Videos
Marketcap
  • Home
  • Bitcoin
  • Crypto Updates
    • Altcoin
    • Ethereum
    • Crypto Updates
    • Crypto Mining
    • Crypto Exchanges
  • Blockchain
  • NFT
  • DeFi
  • Web3
  • Metaverse
  • Regulations
  • Scam Alert
  • Analysis
  • Videos
No Result
View All Result
The Crypto HODL
No Result
View All Result

AI Doesn’t Just Read Texts, It “Sees” Them

October 21, 2025
in Metaverse
Reading Time: 3 mins read
0 0
A A
0
Home Metaverse
Share on FacebookShare on Twitter


Deepseek’s new OCR system processes texts as photos and compresses them as much as 10 instances. This know-how, able to analyzing 33 million pages in a day, permits AI to learn for much longer paperwork.

Deepseek, a Chinese language synthetic intelligence firm, is attracting consideration with its new OCR (Optical Character Recognition) system developed for extra environment friendly processing of text-based paperwork. The system compresses image-based texts, enabling AI fashions to course of for much longer paperwork with out hitting their reminiscence limits.

Processing Textual content as Visible Information

In keeping with Deepseek’s technical report, the system analyzes textual content information in picture format as a substitute of processing it instantly. This method considerably reduces the computational load. The brand new OCR system can compress texts by as much as 10 instances whereas retaining 97% of the data.

As identified, giant language fashions characterize textual content as tokens, with every token containing a number of characters. Researchers are working to develop fashions that may course of lengthy paperwork and conversations exceeding thousands and thousands of tokens, thereby increasing the context window. Nonetheless, because the variety of tokens that may be processed concurrently will increase, so do the computational prices. Thus, a big token capability prevents the mannequin’s reminiscence from filling up even with lengthy paperwork, however it will increase the fee. Deepseek’s OCR answer, nonetheless, processes very lengthy content material as if it had been a picture, successfully viewing the content material as pixels.

Seeing Lengthy Texts as Pixels

The core of the system consists of two important elements: DeepEncoder and Deepseek3B-MoE. DeepEncoder, which handles the picture processing, operates with 380 million parameters. Deepseek3B-MoE, chargeable for textual content technology, has 570 million lively parameters. DeepEncoder combines Meta’s 80-million-parameter SAM (Section Something Mannequin) and OpenAI’s 300-million-parameter CLIP mannequin. An middleman 16x compressor considerably reduces the picture information, rising processing pace. For instance, 4,096 tokens of a $1,024 instances 1,024$ pixel picture are diminished to solely 256 tokens after compression.

Deepseek OCR can function utilizing between 64 and 400 “imaginative and prescient tokens,” relying on the decision. This quantity considerably lightens operations that sometimes require 1000’s of tokens in traditional OCR programs. In OmniDocBench checks, the system outperformed GOT-OCR 2.0 utilizing solely 100 imaginative and prescient tokens. It additionally surpassed the efficiency of MinerU 2.0, which required over 6,000 tokens, whereas working underneath 800 tokens.

The system, optimized for various doc varieties, makes use of 64 tokens for easy shows, 100 tokens for books and studies, and 800 tokens utilizing a particular mode known as “Gundam mode” for complicated newspapers.Deepseek OCR can course of not solely textual content but additionally complicated visible components like diagrams, chemical formulation, and geometric shapes. Moreover, it really works in roughly 100 languages, can protect formatting, and may generate plain textual content or common visible descriptions if desired.

Processes 33 Million Pages a Day

Roughly 30 million PDF pages had been used to coach the system. 25 million of this information consisted of English and Chinese language paperwork, and the remaining comprised 10 million artificial diagrams, 5 million chemical formulation, and 1 million geometric shapes.

In real-world use, Deepseek OCR achieves a really excessive processing capability. The system can course of over 200,000 paperwork a day on a single Nvidia A100 GPU. With 20 servers, every housing eight A100 GPUs, this capability will increase to 33 million pages per day. This pace has the potential to significantly facilitate the manufacturing of coaching information for brand new AI fashions. Each the code and mannequin weights are publicly obtainable (accessible through the supply part).

You May Additionally Like;

Observe us on TWITTER (X) and be immediately knowledgeable in regards to the newest developments…

Copy URL
URL Copied



Source link

Tags: DoesntReadSeesTexts
Previous Post

CleanSpark Stock Jumps 13% on Big AI Expansion Plans

Next Post

Polygon’s Nailwal Turns On The Foundation

Related Posts

Nexo Secures Multi-Year Title Sponsorship Of US ATP 500 Dallas Open
Metaverse

Nexo Secures Multi-Year Title Sponsorship Of US ATP 500 Dallas Open

January 12, 2026
Ouch. The Leaked Steam Machine Price Just Dropped, and It’s Eye-Watering
Metaverse

Ouch. The Leaked Steam Machine Price Just Dropped, and It’s Eye-Watering

January 12, 2026
2026: The Year of the AI Agent and the Return to the Moon
Metaverse

2026: The Year of the AI Agent and the Return to the Moon

January 12, 2026
The Rapid Rise of Embodied AI: From Walking to Feeling
Metaverse

The Rapid Rise of Embodied AI: From Walking to Feeling

January 11, 2026
The Best AI Movies That Critics Actually Loved: A Cinematic Journey
Metaverse

The Best AI Movies That Critics Actually Loved: A Cinematic Journey

January 10, 2026
The First 24 Hours After Getting a Brain Chip: Human 2.0 or Digital Nightmare?
Metaverse

The First 24 Hours After Getting a Brain Chip: Human 2.0 or Digital Nightmare?

January 10, 2026
Next Post
Polygon’s Nailwal Turns On The Foundation

Polygon’s Nailwal Turns On The Foundation

Is this support level make-or-break for Bitcoin

Is this support level make-or-break for Bitcoin

Coinbase Buys $25M Cobie’s NFT To Revive ‘Up Only’ Podcast

Coinbase Buys $25M Cobie’s NFT To Revive ‘Up Only’ Podcast

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Twitter Instagram LinkedIn Telegram RSS
The Crypto HODL

Find the latest Bitcoin, Ethereum, blockchain, crypto, Business, Fintech News, interviews, and price analysis at The Crypto HODL

CATEGORIES

  • Altcoin
  • Analysis
  • Bitcoin
  • Blockchain
  • Crypto Exchanges
  • Crypto Mining
  • Crypto Updates
  • DeFi
  • Ethereum
  • Metaverse
  • NFT
  • Regulations
  • Scam Alert
  • Uncategorized
  • Videos
  • Web3

SITE MAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 The Crypto HODL.
The Crypto HODL is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • Bitcoin
  • Crypto Updates
    • Altcoin
    • Ethereum
    • Crypto Updates
    • Crypto Mining
    • Crypto Exchanges
  • Blockchain
  • NFT
  • DeFi
  • Web3
  • Metaverse
  • Regulations
  • Scam Alert
  • Analysis
  • Videos
Crypto Marketcap

Copyright © 2023 The Crypto HODL.
The Crypto HODL is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In