The 4,096 “bug”June 5th, 2011 at 4:36
If you have been using the Hypergrid you probably heard about the 4,096 “bug.” This post explains what the “bug” is about, how it manifests itself, and how I came to peace with it. It’s also a call for action for grid operators to consider placing their grids below cell 4,096-4,096 on the map. Starting in the previous release, D2 worlds are placed around 2,048 by the configuration tool.
UPDATE 6/10/2011: It turns out that the map breakage starts at 2,048, not 4,096.
Let me start by the observable consequences of this “bug.” (btw, I’m quoting this word because, as you will see, I came to realize that this is not a bug) There are at least two observable consequences:
- The first observable consequence was first reported here 3 years ago in the context of OGP work between SL and OpenSim. In short: if you try to teleport between regions that are more than 4,096 cells apart in either X or Y, the teleport succeeds, but the viewer is blanked out. No crashes, but nothing rezzes. This issue, also known as the “long jump” issue, doesn’t just affect interoperability mechanisms such as OGP and the Hypergrid; it affects teleports within the same grid too. As long as the regions are more than 4,096 cells apart, the viewer blanks out.
- The second observable issue is a lot more subtle and more difficult to notice, but it’s there for those who care to notice. Here is the map of OSGrid around Wright Plaza: Do you notice something strange? No? Look again. After waiting several minutes, we get only the -4/+4 map tiles around Wright Plaza. There’s regions beyond that, in fact the whole space around Wright Plaza is filled with regions; but the map doesn’t show them… unless we explicitly click on one of those water tiles — that gets us another -4/+4 around that tile that we clicked. (depending on which version the sim you’re on is running, this may be +/-8 instead of +/-4).
This happens because OSGrid is placed around 10,000-10,000 in grid coordinates. For grids whose coordinates are below
4,0962,048 the map is always complete as far as there are regions in cells.
Very likely there are other subtle consequences of these numbers 4,096 and 2,048 which I haven’t come across yet.
Now that I described the observable consequences, you may want to know why this happens. I don’t know the answer to that question, I have just a few clues and hypotheses. Here’s what I know.
First of all, this is not an OpenSim issue, it’s a Second Life issue. It happens to affect OpenSim only because we use the Second Life viewers while deviating from how Linden Lab uses their servers. In Second Life, all the regions are placed between 0 and 4,096 in both dimensions. In OpenSim, for unawareness of any limitations, grid operators started placing their grids in arbitrary coordinates above 4,096. The main culprit was OSGrid, and then other grids followed.
I wish I knew the exact technical explanation for this issue, but I don’t know. It seems to be related to how the viewer represents region coordinates, but that doesn’t quite add up. Here’s the Math. The viewer represents grid coordinates using 64-bit unsigned integers which are, in essence, two 32-bit unsigned integers concatenated together, one for X and one for Y. Each number represents a point on the map in meters. So for example, the SW-most point of cell 1,000-1,000 is point 256,000-256,000. In principle, we ought to have 32 bits to represent these dimensions, which would give us 2^24 cells in each dimension (that’s 2^32 divided by 2^8 (=256), the size of regions in meters). Instead, we seem to be hitting the ceiling at 2^12 cells (4,096). There’s a tempting relation here — 12 is half of 24. Where that halving comes from… I don’t know. Maybe some optimization of Second Life. Perhaps some viewer developer could shed some light here having seen the viewer code?
Whatever the issue is, it seems to pervade the viewer code base in profound ways. Even though this issue has been know for 3 years, no viewer has it fixed. At least a couple of viewer developers have attempted fixing it; one even submitted a patch to Linden Lab (see the original bug report which still contains that patch). Unfortunately, I heard that that patch proved to break other things. In talking with Imprudence devs briefly, they mentioned that they, too, attempted to fix the issue in the viewer and came to the conclusion that the changes were so many that it wasn’t worth the trouble.
I made peace with the issue a few weeks ago. My peace with it started when I noticed, in horror, the second observable effect — the missing map tiles. It turns out that OpenSim has a horrible hack to cope with this behavior, that’s why we see +/-4 regions around, otherwise we would see only the current region and nothing else. It’s a horrible hack, and it’s there just because OSGrid is placed above 4,096-4,096. If we didn’t have the hack there, OSGrid would have no map! This horror was the beginning of my coming to terms with what’s going on.
The issue is not a bug at all. It’s a fundamental design decision of Second Life. Second Life assumes that a grid will never be larger than 4,096×4,096 regions. That design decision is reflected pervasively in the viewer code to the point that very smart developers don’t think it’s worth the trouble changing it. If you think about it, this is not an unreasonable assumption. That’s 16,777,216 regions! Would anyone in their right mind ever operate a virtual world with 16 million SL-like regions?! Very unlikely.
The argument so far for calling it a “bug” seems to have been this: “well, if this is the basis for the 3D web, then clearly there will be more than 16 million regions, just like there are more than 16 million web servers.” True. But not on the same map! Not under the same authority! Seeing the map as one single shared space between all virtual worlds is just… wrong! The map is a grid resource; different virtual worlds should have their own maps. They should be able to have regions on the same coordinates — region placement should not require global coordination. Sure, there are interesting things that one can do with shared maps, but in my view of things, the starting point for interoperability is 1 grid = 1 map. As opposed to all grids = 1 map.
So, having finally realized that this is not a bug, but a not-unreasonable design decision of Second Life, I came to peace with it. It won’t be “fixed” because that would require rewriting the viewers to the point of being unmergeable with the official SL viewer. It would be nice if Linden Lab hadn’t made this assumption. But they did, and we have to live with it.
It turns that “living with it” is not a big deal at all. It’s as simple as placing all the regions between 0-0 and
4,096-4,096 2,048-2,048, because that’s what the viewers expect. What could possibly make this an undesirable situation?
Perhaps OSGrid will have a hard time coordinating this change, precisely because it uses a global resource (the map) shared among hundreds of people. (This comes to prove what a bad idea that is…) I hope the OSGrid administrators will also come to terms on this, and force the coordinates change, so that we can finally have an open Metaverse free of hicups.