• obbeel@lemmy.eco.br
    link
    fedilink
    English
    arrow-up
    2
    ·
    23 hours ago

    You mentioned smaller models achieving better results than ChatGPT, but those models have trouble extending their knowledge to a wide variety of topics, which is shown by their subpar performance in GPQA (general knowledge) tests.

    • pcalau12i@lemmygrad.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      13 hours ago

      Personally I think general knowledge is kind of a useless metric because you’re not really developing “intelligence” at that point just a giant dictionary, and of course bigger models will always score better because they are bigger. In some sense training an ANN is kinda like a compression algorithm of a ton of knowledge, so the bigger the parameters the less lossy the compression it is, the more it knows. But having an absurd amount of knowledge isn’t what makes humans intelligent, most humans know very little, it’s problem solving. If we have a problem solving machine as intelligent as a human we can just give it access to the internet for that information. Making it bigger with more general knowledge, imo, isn’t genuine “progress” in intelligence. The recent improvements by adding reasoning is a better example of genuine improvements to intelligence.

      These bigger models are only scoring better because they have just memorized so much they have seen similar questions before. Genuine improvements to intelligence and progress in this field come when people figure out how to improve the results without more data. These massive models already have more data than ever human could ever have access to in hundreds of lifetimes. If they aren’t beating humans on every single test with that much data then clearly there is something else wrong.