• Tar_Alcaran
      link
      fedilink
      45 days ago

      The big problem with training LLMs is that you need good data, but there’s so much data you can’t really manually separate all “good” from all “bad” data. You have to use the set of all data, and a much much smaller set of tagged and marked “good” data.