monyet.cc
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Luu Tuyen to Technology@lemmy.worldEnglish • 8 months ago

TikTok’s parent launched a web scraper that’s gobbling up the world’s online data 25-times faster than OpenAI

fortune.com

external-link
message-square
129
fedilink
566
external-link

TikTok’s parent launched a web scraper that’s gobbling up the world’s online data 25-times faster than OpenAI

fortune.com

Luu Tuyen to Technology@lemmy.worldEnglish • 8 months ago
message-square
129
fedilink
TikTok’s parent launched a web scraper that's gobbling up the world’s online data 25-times faster than OpenAI
fortune.com
external-link
The crawler, dubbed Bytespider, is scraping the internet at 3,000 times the rate of other genAI tools like Anthropic.
  • @jagged_circle@feddit.nl
    link
    fedilink
    English
    11•7 months ago

    I think a common nginx config is to just redirect malicious bots to some well-cached terrabyte file. I think hetzner hosts one iirc

    • Something Burger 🍔
      link
      fedilink
      English
      16•7 months ago

      https://github.com/iamtraction/ZOD

      42kB ZIP file which decompresses into 4.5 PB.

      • @WhyJiffie@sh.itjust.works
        link
        fedilink
        English
        3•7 months ago

        wouldn’t it be trivial to defend against that with a hash check if the size matches?

        though I guess it’s possible to create your own that differs

Technology@lemmy.world

!technology@lemmy.world

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !technology@lemmy.world

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


  • @L4s@lemmy.world
  • @autotldr@lemmings.world
  • @PipedLinkBot@feddit.rocks
  • @wikibot@lemmy.world
  • 1.37K users / day
  • 7.05K users / week
  • 14.3K users / month
  • 28.1K users / 6 months
  • 70.1K subscribers
  • 15.6K Posts
  • 614K Comments
  • Modlog
  • mods:
  • @L3s@lemmy.world
  • enu
  • Technopagan
  • L4sBot
  • L3s
  • @L4s@hackingne.ws
  • BE: 0.19.3
  • Modlog
  • Legal
  • Instances
  • Docs
  • Code
  • join-lemmy.org