If you want to build a custom map type, then you need to understand three coordinate systems and how they fit together. The points on the globe will be described via a simple latitude and longitude. A projection accepts a lat/lng pair and returns a planar x/y coordinate, which we could call a pixel to mimic the Google Map scheme. In this scheme, the possible range of x/y values depends on the zoom level; in particular, 0 <= x,y < 256*2^z. The purpose of the fromPixelToLatLng and the fromLatLngToPixel methods is to translate back and forth between these two systems.
The API also needs to translate from a tile index to a unique URL. This is the purpose of the getTileURL method. A simple technique to do this mimics Google's tilespace coordinate system. At zoom level 0, there is one tile indexed (0,0,17). At zoom level 1, there are four tiles indexed left to right, top to bottom by (0,0,16), (1,0,16), (0,1,16), and (1,1,16). At zoom level n, there are 4^n tiles indexed by a triplet (x,y,17-n). The numbers x and y must satisfy 0 <= x,y < 2^n and indicate the relative location of the tile to other tiles at the same zoom level. I believe that the 17-n, rather than a simple n, is a relic relating the current zoom levels to the API version 1 zoom levels.
You can use the map below to play with these parameters and to determine Google's URL for a particular tile. It's pretty easy to see how to translate a tile index to a Google Map URL and my getTileUrl functions mimic this strategy. Alternative strategies are certainly possible. Google's satellite images seem to use a string whose length is the zoom level and whose characters indicate a sequence of zooms to determine the tile: (q=NW, r=NE, s=SW, and t=SE).