Tricky Bits with Rob and PJ
Rob Wyatt and PJ McNerney discuss the latest and greatest news in the tech world and to figure out where things have been, where they are, and hopefully where they are going.
Tricky Bits with Rob and PJ
Augmented Reality - Part 1 - Deep Dive
Enjoying the show? Hating the show? Want to let us know either way? Text us!
In our first two parter...Rob and PJ explore VR, AR, and XR, discussing some of the challenges that these related pieces of technology have faced off, how companies have tried to solve these problems, and where limitations still exist.
Unafraid to dive into the deep details, Rob and PJ discuss optics, latency, cameras, and displays, going down the rabbit hole of technologies that form the backbone of all things VR, AR, and XR today....and help explain why we don't quite super vision, yet.
Hi folks, a little preamble to this episode. What are the goals that Rob and I have for this show is to do technical deep dives. And with this particular episode, we cover a really large amount of stuff. So rather than try to water it down. Well, we decided to do was to break this up. Into our very first two part episode. So this will be. Our first episode on augmented reality. We hope you really enjoy it. And we hope you stick around and decide to download the second episode as well.
Ierengaym. com ierengaym. com
pj_3_01-22-2024_100459:All right folks. Welcome back to another episode of Tricky Bits with Robin, pj. This go around. We're gonna dive into a really fun topic and we're gonna go kind of deep into it, augmented reality. We're gonna meander a bit through how we've gone from VR to AR to xr, and talk a little bit about some of the devices that have been on the market, that are on the market, that are coming on the market, and get into why this stuff is hard. What are the challenges and how it's evolved over time. And maybe a little bit of where's it going? Is it gonna take off? Hmm. Now, Rob, I'll admit that I've had a tiny bit of experience in ar. I played around with the Qualcomm library on iOS years ago, and I've dabbled a bit in AR kit and AR core. But I believe you might have slightly more experience than me in this area. Uh, can you talk a little bit about some of the places you've done some ar XR stuff at previously?
Track 1:Yeah, I've worked on pretty much all of the commercial AR solutions allowed there, and a lot of VR platforms other than HoloLens. It's the only one I've never actually even seen. I've never put it in my head, never done anything with it. Never looked at the SDK, never done anything. But I have worked on the Oculus hardware, I've worked at Magic Leap. Uh, I was, did the whole graphics architecture stack there from basically the core of the motion to photon, as we call it, problem. And we'll get into what that means later. And then more recently I worked at, apple on Division Pro, and obviously it's out next month, so we can dive into some details on that as well. I also did a small stint at daiquiri. Wasn't really AR related, but kind of was. It was just kind of future tech holographic type displays which have similar rendering problems as ar, but not really the same display and visualization. So being in the trenches in this for going on 10 years now.
pj_3_01-22-2024_100459:Now to kind of baseline for folks across the board here, we've got virtual reality, we've got augmented reality, we've got mixed reality, and they're all kind of related to each other, and I think it's worthwhile that we maybe take a little trip down memory lane. To help lay some groundwork for a lot of the technical problems we're gonna talk about. So let's talk VR for a moment. Uh, in, in one sense, it seems like, oh, VR basically is, I'm rendering two images and I'm good to go. I've got my stereo vision, and what other problems could there be?
Track 1:So VR has been around since it was first conceived, and like I said, as stereo imagery. And we figured out early on that if we render two images for our eyes that, uh, that are slightly different, your brain will perceive it as a 3D image. And this was done in like the 1890s. It was done with real photographs all the way back then. So the optical trick has always been known as, at least it's in modern history. And then when we get to 3D we, in the late seventies, the eighties, we started to think, oh, we can render two separate images and we can basically offset the projection matrices of. The two of what you'd normally use for a 3D view. We can basically offset them a little bit for the gap between your eyes, render two identical images, but from a different, slightly different viewpoint. And your brain will perceive them as a 3D image, but it doesn't quite perceive it as a 3D image. It's still an issue of you put a screen in front of your eyes. So the 3D imagery is telling your brain that it's 3D and you should perceive it as 3D, but your eyes know that they're focused on a plane just an inch away from your face. So this gives some people incredible headaches. Um, myself included, believe it or not, I've worked on all these platforms and they all give me headaches. And so there's a lot of problems which have not yet been solved with accommodation and focus and things like that. So that brings us to like modern VR, where we literally put a high resolution screen in front of your face. We render two images. That gives us a stereo view. But part of AR is also the tracking of your head. Like, which way am I looking? Like I could trick myself into seeing a 3D image, but how do I trick myself into seeing a move in 3D image when the movement's from my own head? So I move my head to the left. I want the world to move as if I really move my head. And that was solved more recently with Modern imus and obviously acceleration magnetometers and Gyroscopes. The original Oculus before when it was still a Kickstarter project, did an amazing job at this. It was kind of janky looking back, but they, they kind of put it all together with modern tech., I think Paul Malki and those guys did a great job at putting that first Oculus one together. And as technology progressed, it got better. But a key part of that is the head tracking and head prediction and minimizing latency in order to make it even possible to do vr.'cause if you think about input in a video game, for example, you read the controller, it might take two to three milliseconds, then frame one, the CPU does all its update work and does all of the updating of the three of the objects. Frame two, the GPU now renders will frame one generated and then finally it gets scanned out over HDMI displayed on the tv, which might also take time based on how much processing the the, the TV does. So even though a game is running at 60 frames per second, the perceived latency from. Hitting a button to see and go result on the screen could be 70 or 80 milliseconds, which is way too long for something like AR to work or VR to work. And we need that to be ten eight five milliseconds. And that's from what we call motion to photon. So from when you move your head to when you see the result, you need that to be above five to 10 milliseconds. Uh, for an ideal experience. If it starts to go out of there, you start to perceive lag. You start to get things which also induce headaches in some people.
pj_3_01-22-2024_100459:Basically it's a, a seasickness problem at that point in time. If you are, the whole world basically is kind of just lagging or shifting at a out a way. Like it's, it just feels uncomfortable.
Track 1:Yeah. And what I just talked about with all the, the latency and the lag, we'll get into how we fix that. It applies to AR and VR equally, but AR has more stringent requirements. So vrs a kind of a lax aversion as it comes to some of these things because you're not dealing with the real world.
pj_3_01-22-2024_100459:With vr, we can guarantee that the frame is coming in, or we have the opportunity if we have enough hardware or we get the pipelines correct, of ensuring that that frame is gonna land at the right time. Like we control the entire frame because it's synthetic.
Track 1:yeah, we can control, it's rendered at the same correct position, but your head might not no longer be in that position. So you still have, even with the most powerful hardware, you still have to minimize latency if you want to utilize the hardware to its full extent, and you can stick a 40 90 in there and it won't make the latency any better and you'll still get head lag. So it's not just about render performance. So we, we do tricks like, okay, so your head's here right now, but that's no use to us because in the. 30, 50, 70 milliseconds, it's gonna take us to render the data and actually display it. Your head's gonna have moved. So we use head prediction. So we go, okay, you're here now. You've been moving like this. We're gonna predict what your head's going to be. We might do a couple of these predictions. We might do a prediction of where's your head going to be for the update frame. And we can make that frum bigger so we're not cool objects, which potentially you might see.'cause if you say, okay, your head's here and I'm gonna tightly cu to the view frum and then it's a a few degrees off. When you do the final prediction and you walk the frame or you regenerate some of these objects, you might not have had them in the render buffer to start with. So you've gotta kind of over call. So you have some ex objects that are slightly outside your view. So that's one thing that we can do we could call with one set of objects and render with a different. Camera, which is where, why we need the over calling. So then we have to predict further into the future, which the further you predict, the less certain you are, the more error you will have we could also predict like, okay, when this frame hits your eyes, we're gonna pull it here. But obviously you can't ever predict that for, you can't predict to the point when it, it's your eyes'cause the frames already generated and it took time to generate that frame. If it took 10 milliseconds, then you had to predict 10 milliseconds earlier and now you've got an inherent 10 milliseconds of latency. But what you can do is you can do what we call a, a late frame warp. Where, okay, we rendered the image 10 milliseconds ago when we predicted what that frame was. And then as we're scanning it out. We know exactly right now where your head is facing. So we can do a, a more trivial warp, just a 2D, a fine warp on the final image to kind of warp it into the, into the correct place. But that can only really do rotational things because if you start to move the position, then you get power ax errors because you've now got a 3D image that's not quite right. So then you get into more complicated warps of filling in the background objects and using potentially AI these days to fill in what it thinks was behind the object, which is now not visible or is visible and lots of problems arrive. So minimizing latency is a very hard problem, and Oculus originally took a very good stab at this, and over time it's it's been refined very, very well. And just head models of how your head moves and. Having two imus on the headset instead of one. Because ideally we'd put an IMU in the center of the motion, which is in the center of your head, which we can't do. So we can put two one on each side in a known location relative to where the center is. And then you can take movement from two imus and predict what the center rotation is.
pj_3_01-22-2024_100459:so basically interpolate between those two fixed positions to say this is what we believe the center would actually look like.
Track 1:It's not even a belief. It's known. I mean, we know they're on the outside of a shape and if this one moves forward, this one moves backwards, then the center probably didn't move. And then if you move left or right, you can very easily imagine two sensor on the outside of a circle and you're predicting how the middle one at the center of the circle moves. It's not that difficult. So there's been a lot of math, there's been a lot of geometry processing, there's been a lot of prediction improvements. Faster. imus imus, you can read them at maybe. When I was doing this, about a thousand frames a second. So you get a thousand samples per second. So how do you even get that into a processor? If you're getting it late, it's no use to you. You need it right now. So within a given game frame, you get potentially 16 samples from the IMU using those for prediction. You can't be sampling this once per frame and using that for prediction, you've got to be taking the high speed input, predicting where it's going, and then using that for rendering. So once you've done all this for vr, you get a pretty stable image. You can get that perceived latency down to 10 milliseconds, which is acceptable for vr. Even 16 is acceptable. If you are a frame off on your, on your head when you're moving it, it's not too bad if you start getting into a hundred milliseconds and you move your head and the image lags behind you a hundred milliseconds. That's when you get the seasickness problem. If you keep your head still, it's perfectly fine because the delay is not visible. But if you rapidly turn your head sideways, left or right, the image stays still for a hundred milliseconds and then rapidly moves and then stays still a hundred milliseconds later. It's very nauseous and it's not acceptable for a consumer product. Then we get into ar.
pj_3_01-22-2024_100459:With ar, we're gonna specifically be talking about the approach HoloLens or Magic Leap took, where it is we're painting basically the, augmentation onto some sort of glass pane, but the rest of the world basically is able to fly through that glass pane easily. Like there's no, there's no screen like in vr, it's just like you're getting photons from the real world at this point in time.
Track 1:Yes. That's what augmented reality is. It's a virtual reality is the entire scene is virtual and it's all 3D rendered. How good that looks could be janky. 1970 star graphics. It could be on Unreal five doing its best rendering possible. That's where the performance of the hardware comes in. You still have to fix the latency problem no matter how good the GPU is. The But yes, going back to your point about augmented reality, the real world in this case is the real world. You are wearing some sort of glasses, which could be a headset, it could be a pair of glasses, it could be some cyberpunk looking things, but ultimately it's clear glass with some sort of waveguide optics in it. So you can see the real world and you can draw anywhere in the glass, at least where the wave guide is, which for Magic Leap was about, I don't know, 90 degrees field of view. It wasn't a very big view port. It was very much at the center of the glass. So this all sounds like a head of display that's all it is. It's, they've been around for years. Fighter pilots have had them since the seventies and then they've mostly used in things like that. Like it's great for a fighter pilot, but this is head static information. If you move your head, the information you have in your display stays with your head. So you can pilot, can look around anywhere he wants and still see all of his flight instruments directly in front of his face. Augmented reality takes the head tracking information that we can get from modern imus. The same thing we did for VR and starts to go, okay, so if I want to pin a pixel to the world, not to your head to the world, so this pixel is on the door. If I move my head, I have to re-render that pixel in a different place based on where my head is. And the real world's, the real world, it's going to move instantly. And how quick we can recompute where that pixel needs to be in the display, put it back there, is how good the AR is. And the problem with AR is you don't just get the C sickness lag because the real world's gonna move instantly. You get this weird, uh, shimmy where the virtual object just kind of slides around and how much it slides is basically a measurement of the lag.
pj_3_01-22-2024_100459:so you're racing, you're racing against. The speed of light is really the problem here.
Track 1:you race it against the speed of light and how fast your brain can recognize an objects moved. And there hasn't been a lot of the brain research side on this as to like, how quick is that and what movements are we more sensitive to than others. A lot of that's because we didn't have the technology to do the tests. We had some tests we've done, and a lot of that was used in the early ar. But until you get the device and you can start to do research on the results, it's hard to predict what your brain's going to do. So a lot of this is happening now and as AR moves on, it will get better and it'll get more involved in the, the psychoanalysis side of how we perceive these things. But for now, it's very cut and dry. It's like you moved your head. I gotta, we move these pixels over here. So if you perceive it to be in the same place, and that is kind of what magic leaps of that was their thing. They took the head tracking tech that, uh, was around at the time, improved upon it, added it to a head static type display, a modern version. It's all wave guided. Doesn't prohibit your forward view of the world Too much. But it comes with a huge set of problems and head of displays. The information is always right in your face. It's very much notification style. It's good for, I know people make glasses that have displays in them, which are for biking or for skiing, and they tell you your speed and things like that, and it's always in your face. It's very head of display style. When it comes to true augmented reality, you've now gotta deal with the real world and it's really difficult. Occlusion doesn't happen in vr. It's like you rendered the 3D scene. So the depth of A takes care of all of the oc, of all of the occlusion. But now you have an object in augmented reality, which is just in space. If that object moves backwards and goes through a wall, it's and remains visible. Your brain does not like that one bit. Your eyes touching. It's like, oh, that's horrible. It's like, what just happened? Likewise, if an object goes through a a open door, what goes round the corner, it needs to occlude on the real world. So otherwise, you get the same problem. You can see through, you can see through the objects behind walls, and these are the problems which make AR difficult. The technology of AR is literally the same as vr, just with faster latency. It's the technology required to get it to work in an acceptable manner. And that becomes down to scene understanding where I understand what I'm looking at. So if I render this three object, I need to clip it here so it fits the geometry of the real world. But then how do you detect the geometry of the real world? Are you using stereo cameras? Are you using depth cameras? Are you using some lidar type device? All of this is on your head, so how. Do you make the headset light to make it so it's wearable and ideally wearable for long periods of
pj_3_01-22-2024_100459:Each of these different sensors also are gonna solve potentially different problems, right? I mean, I could use lidar, I could use stereo cameras, I could use depth cameras. But you know, you'll run into degenerate scenarios where it's like, oh yes, that person went through like past a wall, but that wall happens to be made of glass. So not only do I need to understand the physical geometry of the real world, I need to also understand the material properties so that you should be able to see a character walk behind a glass wall. Correct. And maybe it should be distorted with refraction, uh, or reflective properties there. But a, a standard depth camera might not give you that. It might just say, oh, that's a wall and I'm gonna clip you now
Track 1:A style at DEF Camera doesn't even give you that. It's the data you get back is
pj_3_01-22-2024_100459:point clouds of depth. Right?
Track 1:Yeah, so Magic Leap had some games and they were kind of things like Minecraft style things on a tabletop, or you could do, they had a game where it would open a portal on a wall that aliens would fly out of the portal and you'd shoot them, uh, by looking at them and clicking a button type thing. And even finding a flat surface is difficult because when you get closed, a surface isn't flat. Then you see some of this in the ar uh, videos you'd seen, they'll build a point cloud of your environment. You slam type technology to map out where you've be in within your household, the room you're in. And that data is very noisy. Flat surfaces are not flat. And then you get into problems of. So if I use the point cloud as the reference that I'm gonna render against and I render on top of that, then the depth buffer from the point cloud will clip the 3D objects prematurely before they're on the surface. So you get these weird gaps and these weird, uh, weirdly bottoms to objects where they're supposed to be sitting on the 3D surface and it's these problems which need to get solved to make AR usable. And it's why AR isn't usable. It's fraught with issues. You mentioned some of them of how does glass work, how do mirrors work? And my house isn't your house where once you put VR on, it doesn't matter what room I'm in, it's entirely virtual. The light from the outside world is blocked out and you're now in this other world. It doesn't matter whether you're in daylight, dark, indoors, outdoors, it'll always be the same where AR is. Very dependent on the environment you're in. And it's, if the sun's out, for example, or my house has lots of glass, it just wouldn't work. There's so much bright light coming in. We are limited in how bright we can make the displays. We don't have the dynamic range the sun has, so cameras don't have enough dynamic range to compete with local objects. If the sun is in the frame, you've got lots of, you see it all the time on your phone. You've got lots of, uh, exposure problems. It gets better with HDR, but we didn't have HDR back then, and it even, it doesn't solve the problem. And cameras have a much wider dynamic range than displays have to compound. The problem with Magic Leap Hollow lens style AR is the light you add in the glass. The virtual light is additive. You can't subtract light from the real world. You can only add to it. Which means you can't easily do shadows. I can't subtract light, I can't make it look like the virtual object has a shadow. And we came up with some tricks to handle this. You could do things like render a, a gray polygon over the whole frame, and then you could subtract from the gray to simulate the shadow, but that reduces the dynamic range of the display. And now you're adding, kind of washing out the real world because you're adding gray light to everything coming in, which is the real world. And it's things like, okay, I have a shiny object, and I opened the door and the light came in. How quick can that shiny object respond to an environmental light? Imagine taking a vr, a AR headset to a, like a nightclub where there's lights flashing all over the place. Like how fast could these virtual objects actually respond to it? How do they respond? Is the other question. So now you've got more cameras on your head because now you've got to see the incoming light that, uh, the, the real world is seeing. You've gotta process that, make it into some sort of environment map that you can project back onto a 3D object to simulate it being a shiny object in the real world. And then you get into problems of, okay, so I looked left, but I never looked right, so it doesn't actually know what's over there. So does it make it up, does it make you do a full 360 with your head as part of the user experience of like, it's not very natural. It takes time to build up this model of the entire environment and you can, an object can reflect like that's behind you that you've never looked at or never will look at. So it can never be real. It's so difficult to do. You could do 3D cameras and have 360 D degree cameras, but now you've got more sensors on your head and you've got more weird. Circular lenses on your head so it's not very wearable.
pj_3_01-22-2024_100459:It sound, it sounds like a amazing neck exercises, you know, when you're wearing all of these sensors. and to be fair, this is, this is only for a, a static scene, right? We're not even talking yet about dynamic scenes like you have to look all the way around just to get a static scene correctly
Track 1:Oh yeah. It's so difficult. All of these problems individually solvable in very contrived examples. But when you put them all together in the real world and just let a consumer do what they want in their own house, none of these problems are solved to a acceptable level that you'd be like, I could tolerate this. Like ar right now he's not even tolerate'cause it's so like, that's broke, that's broke, that don't work, that don't work. And, and then you go outside with it. And none of them are really made to go outside. But let's take him outside and now you've got the sun to deal with and adding light. To where the sun is in the glass makes zero difference.'cause we can't add enough light. Uh, we can't subtract light. So we can't make it darker. We are now get the problem of scale. And some houses get this too, of, we only have, we do all the tricks we can do with optics. We've got stereo cameras, we've got depth cameras, and, but if we do stereo, uh, discrepancy between two cameras, they're only six inches apart.'cause the best you could put one on each side of your head and we can put them wider than your eyes, but we don't have the processing that your brain has to undo the information. So in reality, cameras are always wide as they can go. So maybe they're an inch wider than your eyes.'cause they're on the corners of a pair of glasses frames. We can only really derive depth from that stereo camera to about a. 20 or 30 feet, which is enough for indoors, but outdoors, it's irrelevant. I can look down the, my road and see all the way to Denver airport, which is 70 miles away, and I can easily tell that the plane is behind the mountain. Um, the distance, although my brain, my eye resolution can't do that via stereo imaging. There's a lot more going on in your head than just stereo imaging and pure stereo discrepancy from a, a pair of cameras after about 20 feet, a given point doesn't move at all between the two images of the two cameras, so you'd have no idea whether it's uh, close, whether it's far. You've got no in depth information about that at all. So our brains are using, I know that's a mountain, I know that's a tree, and every now and then your brain will get it going. You'll look at something and you're like, oh, is that in front or behind that? And you can't tell. And that's the AI side where I think this is a problem that's solvable by ai and we start to solve these problems in the same way our brain solves these problems. Your brain isn't doing purely, I'll take two images from two eyes and I'll figure out what the scene looks like. And you can just prove that by covering one eye up and you can still see depth perfectly fine, you lose things like true depth perception if you're trying to land a plane or land a parachute, covering one eye would be a bad idea. Uh, but in general, looking around in the world, if you only have one eye, you can still perceive the world in three dimensions. So there's a lot more going on
pj_3_01-22-2024_100459:Yeah, you, you can still have, you know, parallax understanding with one eye when you like look and you're like, oh, I see. Like how this is moving relatively to each other.
Track 1:mathematically, you shouldn't have, from a purely, if our eyes are just doing this processing, if, if it's
pj_3_01-22-2024_100459:Well, for a single frame. Yes. Yeah. Yeah. I.
Track 1:it's, it shouldn't be. So you add all of this together and it's just not usable. Unfortunately, most of the killer apps for AR are outdoors, that's why there are no killer apps for ar and that's why it was never adopted by consumers. Now, it was used a lot by enterprise and the military use it for training, and these are all controlled environments. If you are using an AR headset in, for example, on a conveyor belt, and you're looking at things coming down the conveyor belt, the installation of that system could dictate that there can't be any windows, there can't be any mirrors, and the lighting needs to be this bright all the time. It can't be darker, can't be lighter. All sorts of things you can specify for a enterprise factory type environment. Absolutely not. Things that you can specify for a house
pj_3_01-22-2024_100459:So to get specific, one of the applications, if I recall, so daiquiri was very much in this niche industrial enterprise space. And if I recall correctly, they were having their headset be available to like, oh, I need to repair this particular area of a naval vessel. I think where it was, as you said, a very controlled environment, very like locked down so that, Hey, I need to understand the schematics in this area to properly understand how to fix it,
Track 1:And Daiquiri were entirely enterprise based. They had a few r and d teams, which probed possible consumer spaces. HoloLens kind of went for the consumer space and realized that's much easier to do. Enterprise side magically was like all on consumer to start with. Now they're all on enterprise and they've missed the boat there too'cause everyone else is already there that, uh. They didn't understand the interactions of the technical problems. Magically was a very badly run company. Very small people solving problems, management, just putting up bullshit videos that was like, look, this is how it all fits together in an ideal world. And that was literally rendered footage. And it's, it's the little details that we just talked about which make it impractical.
pj_3_01-22-2024_100459:At a
Track 1:AI can help with a lot of these problems. Like for example, I always use this example, I have a game where I have a guy, a bad guy that runs around the kitchen. I could walk in your house, find a kitchen and go, oh, let's have a bad guy, which run around. And I could figure out in my head how the game would play in your space. I can go to my kitchen and figure out the same thing. Very different type of play, very different layout in the kitchen, the back. But I can still visualize how such a bad guy would interact with my kitchen. I. That interaction in my kitchen is very different to the interaction in your kitchen. You have an island, you have two islands, I have no island for example, and, but it's all in the kitchen. First of all, how do you find the kitchen and then how do you have emergent gameplay for this type of game that can scale across the unknown number of kitchens that are out there? And how do you take that emergent behavior and tell the same story for the same game, for the same person in an entirely different environment? All unsolved problems. So it kind of makes it that it can't be done and people who attempt, it's typically in a very contrived environment. You're coming back to, I could make a game in my kitchen, it'll play great, but it would only play in my kitchen. So it's useless to anybody else. Really. That's the same enterprise problem of like, it only works in this ship, in this factory. And we have specialists who come and we work it for somewhere else. Possible in the enterprise space, not possible in the consumer
pj_3_01-22-2024_100459:One of the, uh, and maybe this gets to the, the, the whole notion of the contrived environment. I recall reading a lot, you know, this is eight years ago about Magic Leap. You know, they had Steven Spielberg in, or they had like, a lot of luminaries come in, got to try it out, like the test runs, like maybe it was in the warehouse or in the, in the building, and it was all hooked up still to, you know, massive computers. To what extent is there a, a big problem, not just to the contrived environment, but like, hey, like how do we shrink this to be power friendly and weight friendly for your head?
Track 1:So, magically been survived, either of those? Uh, unfortunately. So we've already talked about the, uh. All the sensors you need. Most of these sensors need to be head relative. So they need to be on your head. They, the cameras need to move with your head. The IUs need to move with your head. The depth sensor needs to move with your head, which is why so far we haven't seen a pair of glasses and apple or no closer than anybody else in making a pair of glasses because where would they put all these sensors? They're not magic. They can't magically make a depth camera, be a 10th of a millimeter square and just pull it in the middle of your forehead. Apple can't do that. So they're playing with the same technology that exists in the real world as everybody else's. And then you've got power, this thing, where are you gonna put the processes? Where are you gonna put the uh, battery? How are you gonna charge this thing? Which is why they always come with a puck if like, okay, the sensors on your head to keep the weight down, the battery's heavy, the processor board's heavy. Uh, we'll just put a puck in your pocket. Then we'll have a cable but now you've got a lot of data in that cable. You have, all the camera feeds and mipi won't go that far. So now you've gotta do things. Okay, well we've gotta take it off the camera bus and process it into something else and then send it over USB ethernet or whatever you want to use for the interlink from the headset to the base. That's more electronics. Now we need all these conversion chips also on your head because we can't get the camera signals off the head.'cause they'll only go six inches in a phone. That's fine. But it's more than six inches from my head to my waist. And the cable needs to be longer than that too because it needs to be comfortable to wear. So you get into all these form factor issues and this increases power. All these conversion chips and all this extra processing you have to do, which you don't think you have to do, adds to the power budget. And then you've got, how do you keep your core, uh, how long can it be for. Blah, blah, blah. So I think Apple are in a better position to fix some of this because they already have a phone, which is, could be a puck, it can certainly add to the experience. And it's a very powerful device where magically didn't have such a thing. We kicked out Qualcomm, put in the Nvidia X two and it was a much better move. The X two is very powerful, it was at the time and could do much more, but it's compared to the amount of processing required, it's pretty much irrelevant. And again, it, if you run it flat out all the time, and a lot of it does have to run flat out because it has to deal with all the display and all the movements and all of that, then the battery doesn't last very long. So coming back to your question of like, yeah, magically used to connect the headset to a pc, so now it's totally static and they should have done this from day one and at least allow it to be solved this way. Just sell it as a device for your pc. It's always powered. It doesn't necessarily have to slow down. It does for power reasons to conserve power doesn't have to. It could run flat out nonstop. So they do this and they'd also do all the tricks too. If you look at the old photos of Magic Leap, they had very creative placement of furniture, very weirdly designed rooms. And these rooms would've like checkerboard couches, which were used as part of the recognition. So it knew you was in that room. They'd have photos on the wall where if you look closely, were actually like QR locators and lots of things. Look, look very closely at them. Pictures from Magic Leap and they are of all these crazy rooms. And the irony of all this is Magic Leap bought the old Ola building'cause. It was owned by Google when Google bought Motorola. It's this building in Plantation, Florida, and it's the old Motorola building. And they employed a bunch of Motorola old employees. So some employees got their old office back 10 years later. And so magically acquired this building from, Google acquired Motorola, had this building, didn't want it, so sold it to Magic Leap. Magic Leap. Got it. It spent a fortune refitted it. How did they refit it? Like it's a AR company where we know AR doesn't work with glass. They built a goddamn glass office, like it was like very Apple, everything. Terra and Glass and like very modern office. So the headset didn't even work anywhere in Magic LEAP's offices. So they had these special rooms to the side, which were the old, like fully doored offices, and they'd set them up as living rooms. These are the rooms that had all of the crazy ass furniture in it, and it would recognize which room it was in based on some of this crazy ass furniture. Like I said, it's a checkerboard couches and QR codes and pictures and things like that, which didn't help because that made the head chucking in some of these rooms super stable. The head jacking in these rooms was really good because the cameras would track the position of the QR codes on the wall. And from doing that for multiple ones, you could figure out exactly which way you were looking. Didn't help the latency problem, but it did help the stability, uh, problem
pj_3_01-22-2024_100459:Well, From a, from a development standpoint, was that actually a hindrance? Because did that give a false sense of security, it was
Track 1:it was a massive event. It was a massive event. All the demos relied on all of this bullshit stuff. The real world didn't have. Um, so anybody outside of the offices just got garbage data where inside the office you've got great data and it, it was never gonna work. It was like they, they knew it wasn't gonna work.'cause the technical people, and they had lots of good technical people were saying like, can't do this. That's totally cheating of like, if we're gonna use a QR code for stability, we have to tell people we're using a QR code for stability. So, I mean, it wasn't that bad. It wasn't that obvious, but it was, if you look closely, it's pretty obvious. And if he was there, it's like, this is not good. And it's like, it's it's assistant, but it's not really, it's doing a lot of assistance, let's put it that way. That's why the demos in the office were good and that's why they could sell it to all these people.'cause he was fairly good in the office, but it was also very low resolution. Magic Leap had this great lab where they had, the fully variable focused displays and things like that. And it was a whole room. It was no way. It was an optical lab whole room. And you'd look through a pair of like binoculars and you'd get this great experience'cause it was all variable focus at every pixel. And like really good stuff would never possible in a pair of glasses. And Magic Leap also was the only one at the time that did any sort of verbal focus. And they didn't really do verbal focus. They had two focal planes.
pj_3_01-22-2024_100459:I was gonna ask, does it require basically multiple lenses effectively? In
Track 1:Yes. And then each of those, that's all GB'cause there's a wave guide for all g and B. So a typical waveguide pair of glasses will have three wave guides. It'll have three lenses stacked back to back'cause the wave guides for red, green, and blue light are completely different.
pj_3_01-22-2024_100459:Oh, okay.
Track 1:And because they're not just wave guides, they're kind of tricky. The wa uh, wave guides designed because it's the light's injected at the side of the glass, which is the only place and a pair of glass is that you can inject the light. So they inject the light at the side of the glass that is tying a little ole, projectors or ACOs, projectors that sit at the side of the hands, which is why there's always a big bulge piece on the side of these glasses for any sort of ar display. And that's where the projector is. And it projects into the side of the glass, into the waveguide. The waveguide transparently takes it across the glass perpendicular to the view direction. And then. We projects it forward into your eyes in a grid shape, and that's kind of how it works. But there's one for red, one for green, one for blue. And these are optimized for, like I said, wavelength and focus. So magically it had two of these stacked up, so there was six lenses in
pj_3_01-22-2024_100459:Hmm.
Track 1:and it was a RGB image at two different focal planes. There's also issues with getting those focal planes because they're physically apart. You've got red in front of green, in front of blue. The focal length's not the same for each piece of glass at this point, and it's'cause they're all a millimeter apart and it's very, so this is what magically optical tech was. They fixed a lot of those optical problems to get red, green and blue to focus. And then they did two of those and they ended up with two focal planes. So you could render close and it would be something like 18 inches would be like interacting space right here in the, in the realm of where the depth camera can see. And then they had a second focal plane, which was at about two or three meters in the distance. And that was basically there to infinity. And it was very hard to render to these things. How did you switch focal planes? If an object moved forward and back, would it flick? Would it, would it flick focal planes? Will it blend between focal planes? How do blending work when it's all additive light? Uh, you can see both. So again, not an easy problem to solve. Having two focal planes may seem like a technical bullet point, but it causes all sorts of other problems which aren't unsolvable. And again, and this is just fixing what you see, it's not even fixing how you interact with the world as we've already talked about. Those problems are insurmountable even today. This was seven, eight years ago.
pj_3_01-22-2024_100459:I find the problem we're talking about here with the focal lengths really interesting. It reminds me a lot of doing level of detail in games or even in movies of like, how do you figure out when you're in between those two spaces, like how best to render the object? And in this case, we're not talking about two, we're talking about six. Right?'cause
Track 1:but it's not even that a bad LOD in a movie doesn't make you throw up where this does. It could make you so nauseous with lagging all the other effects that are going on at the same time, bad clipping, not things that are behind visually at a distance, which puts it behind the wall.'cause your brain can tell how far, where the wall is and he can tell how far away the object is. And he's like, that's behind the wall and I can still see it. And your brain has no real ability to process that. It's never seen it before. I've never seen an object behind a solid wall. And glass is handled, not handled at all, like. It's very difficult'cause the visible cameras can see through the glass, but the infrared cameras that are in the depth cameras and all the other infrared cameras that are also a part of the center set glasses opaque to an infrared now they're getting different information. The visible cameras can do stereo discrepancy all the way out the window. And the depth camera is not agreeing with it. Which one do you use? And then you get the almighty mirror of, what do you do with a mirror? It's, do you render it twice? Can you even detect the mirror? How about a mirror Tola in angle? How about a mirror? That's just a reflection on a piece of glass. And if any of that's missing, your brain just rejects it. Like that's not real. My dream of AR is that it's movie quality, special effects in real time, like movie special effects really are pre-rendered ar. They're literally augmented reality. They filmed it, they added stuff to it. They make it blend perfectly. What I haven't worked in movies. You see the amount of effort that goes into doing this. They'll, they'll rebuild things at a different scale. They will render or model an entire real scene in incredible detail just so they can put the special effects in it. They'll match cameras, rate tracing cameras to physical cameras and all things like that, all to get the perfect shots, is that the augmented parts of the effect and the real parts of the effect are seamlessly blended. Reflections are perfect. Shadows are perfect. Whether it be real shadow on fake object or fake shadow on real object, it's all perfect. It gets better all the time. So my goal of AR was like, this will be cool if we get to do all these things. And then you start working on the technical bits and you can solve A, you can solve B, but they don't work together. Then you solve C, which totally replaces a. It's just this progression of technology. Like I said, every little bit can be solved in a very specific circumstances and none of them play nice together. So AR to this day is still horribly broken factor that in that there are no killer apps, there's no ecosystem. There's like magically what do they have? Why would I buy this? Why would I, your use your chat program when I could use text or I could use iMessage. Why would I invest in your ecosystem when it's an entirely new platform and we already have existing platforms and it's the platform play problem. And I think going back to vr, Oculus have messed this up too, of. Why is they not a 3D avatar version of WhatsApp? Why can't I have a VR version of Facebook? Without that ecosystem, again, it's a hard sell. It's just a tech demo. And Oculus have done a lot better than just a tech demo to refer to them. But Magic Leap was always just a tech demo. The ecosystem part brings us nicely up to the 800 pound girl in the room, which is the visual pro.
PJ:And that concludes our first episode of augmented reality. To be continued in. Episode two of augmented reality. See you all soon.