Like others have said, it's way more resource intensive than text based systems. Even discounting higher res vids, if you go to any random larger YT channel and download all their videos in 144p 480p and 720p it'll be quite a lot larger than you might expect. Sure, if you're serious about it you could get an array of hard drives and a small server, but you're talking hundreds of bucks and lots of upkeep. Outsource it to a VPS and AWS buckets and you've still got upkeep but now you've added an extra 0 to your bill.
There's not enough charitable nerds on the internet to host even a fraction of 1% of Youtube. It's even worse if self hosting instances is pushed. Even as a fellow tech nerd, no way I'm hosting my own instance just so I can share a video once in a blue moon. Something that always gets my goat in fediverse discourse is when people always jump to saying something along the lines of "just host your own" then wonder why AP went from ~2.5M users to 0.8M users.
There's also some Fediverse specific issues that hold back a more mainstream audience. There's some fringe political stuff on both sides of the isle which can pretty easily scare off people, and defederation combined with peertube's more siloed approach makes discovery near nil. (can't see content from remote peertube instances unless somebody has already subscribed to that channel on the remote instance from your local instance AFAIK).
Then there's the new platform (or in this case many platforms connected via one protocol) issues. Lack of users, limited/no monetization, limited development/support, and very few pros + a lot of cons at first glance from somebody who doesn't consider tech a hobby and is comparing it to established platforms.
Edit: Can't remember who, but iirc a peertube user I follow who regularly deletes their videos because their host doesn't give them too much space. It's great for a less big tech way to see their latest videos, but not acceptable if anyone's gonna bill something like that as the next big video platform.