NYC Mesh OSPF Routing Methodology
Positives and negatives
OSPF is an interesting choice as an in-neighborhood routing protocol because of its ease of setup (auto convergence, no ASNs), and how ubiquitous it is -- nearly every cheap and expensive commercial and open device supports it. These two positives alone make OSPF worth considering.
On the down-side, it is not specifically designed for an adhoc mesh, it trusts blindly, and has very few tuneables. Additionally, there are a few technical challenges such as the lack of link-local address use, only advertising connected networks (not summaries), and some common defaults on various platforms.
Many of these challenges can be overcome by taking some care to make good choices for options when setting up a network.
OSPF Selection
NYC Mesh has chosen to use OSPF as the standard mesh routing protocol of choice. This may be a controversial choice, as _most_ mesh networks in Europe are using custom mesh routing protocols, or encrypted routing protocols. We have chosen this path because:
- OSPF is an open-standard with implementations on many platforms, open and closed, including cheap older professional switches
- OSPF hugely reduces the burden for installers and members to maintain the network
- OSPF cooperates well with other protocols such as BGP
- Other Mesh networks (CTWUG in South Africa for example) have scaled OSPF to 1000+ routers.
NYC Mesh utilizes a wide range of hardware with differing capacities and weather resiliency characteristics. Being volunteer-driven and operated, its important that the network be resilient, but also easy to maintain and scale. OSPFv2 Point-to-Multipoint allows us to modify routing tables and plan for expansion without overly-complicated configuration planning.
Designing the basic architecture
To standardize across the network, each router has a Mesh Bridge Interface on the OSPF Area with default cost of 10 to all adjacent neighbors. This ensures symmetry in link costs on both ends of the link, keeping bi-directional traffic following the same path. For each "hop" to an internet exit, each router incurs its link cost to transit to the next hop. By calculating the lowest cost to an internet exit, the local router sends its traffic on an Internet exit in (usually) the most efficient manner, automatically.
Example: Node path to internet exit with all default costs
Node A > 10 > Hub > 10 > Supernode > 1 > Public Internet
In the above example, the Node incurs cost 10+10+1=21 to exit to the Public Internet. Unless a lower cost exit becomes available, this will be the preferred route for all internet traffic to and from Node A.
Now that we've standardized route costs, we need to design priority and redundancy to take advantage of nodes clustered around each other while preferring higher-capacity links.
The WDS bridge: ensuring Hub-and-spoke routes are preferred over WDS routes
NYC Mesh uses Omnitik wireless routers at almost all member nodes to automatically connect to each other, providing numerous backup routes in case of hardware failure or network changes, but these connections are often slower and less reliable than point-to-point and point-to-multipoint connections in our Hub-and-Spoke model. To account for this, we put the Omnitik<>Omnitik WDS links on a separate "WDS Bridge" on every Omnitik router with default cost of 100.
Example: Node preferring Mesh Bridge over "shorter" WDS links
Node A > 100 (WDS) > Node B > 10 > Supernode X > 1 > Public Internet
Node A > 10 > Microhub > 10 > Hub > 10 > Supernode Y > 1 > Public Internet
In this example, Node A prefers to exit via Supernode Y as the cost it incurs is 31, versus 111 via Supernode X. If we did not have higher WDS costs, Node A would instead prefer the shorter link to Supernode X, but would very likely experience poorer performance.
For more details on the hybrid Hub-and-Spoke + Mesh model we deploy, see the Mesh page.
Example: Prospect Lefferts Garden
In Figure 1, we see many nodes (marked as red dots) clustered around 2 Microhubs (marked as blue dots) in Prospect Lefferts Garden, as well as multiple exit routes to the north. While most of the nodes will automatically find the best exit, there are some that may have equal costs through multiple exits. To mitigate this, we set preferred routes (via hardware like SXTs, or software with virtual wireless interfaces) on the Mesh Bridge, as illustrated by the green lines in Figure 2. This ensures each node selects its fastest and most stable route to send and receive internet traffic.
We can see this in action on the Omnitik: 10.69.45.7 is on the the "Mesh" bridge interface, meaning it incurs cost 10 to transit. All other adjacent routers are on the "WDS" bridge interface, and incur cost 100 to transit. This setup ensures the local node prefers the 4507 Microhub as its exit route, but also has backup routes in case 4507 goes offline or one of its upstream links is broken.
By implementing this architecture across all routers on the network, we now have high resiliency to outages, scalability, and minimal configuration effort.
Bridge Filters
Before we go further, its important to mention bridge filters. These are required to make OSPF work properly and also ensure members within the same building are isolated from each other. There are 3 filters applied to standard Omnitik configurations.
[admin@nycmesh-xxxx-omni] > interface bridge filter print
Flags: X - disabled, I - invalid, D - dynamic
0 chain=forward action=drop in-bridge=mesh log=no log-prefix=""
1 chain=forward action=drop in-bridge=wds
2 chain=forward action=drop in-interface=wlan2
- Filter 0 prevents devices on the Mesh Bridge from directly interfacing with each other. Disabling or deleting this filter allows traffic to move freely across devices on the Mesh Bridge without first traversing the OSPF Area and incurring the OSPF cost of 10 before moving to the next router in the exit route.
- Filter 1 functions the same as Filter 0, but applied to the WDS bridge
- Filter 2 ensures that guests connected to the open -NYC Mesh Community Wifi- SSID are isolated from each other
- Note that routers without built-in wireless access points, including "core" routers within NYC Mesh, only require Filter 0
It is critically important to ensure these filters stay enabled on each router to ensure individual routers don't "bridge" connected nodes and Hubs, leading to unintended routing paths. Further explanation and examples of bridging scenarios are covered below.
Sidebar: The case against OSPF automation and summarization
Given the scale of the problem and continued growth, we must consider an important question: why not implement automation to dynamically adjust OSPF costs based on link quality, and/or utilize summarization and redistribution to simplify planning?
As mentioned above, our network design is meant in part to balance the following three goals:
- Resiliency across a widely distributed network in a dense urban environment
- Simplicity of configuration and maintenance by a 100% volunteer team of architects, engineers, coders and enthusiasts
- Scalability for future expansion
Further, NYC Mesh has no CEO, directors, or employees, and the board intentionally does not have decision-making authority over non-financial/legal matters; as documented in the NYC Mesh Commons License, the design, planning, maintenance and support of NYC Mesh is done solely by community members and volunteers. While we do have highly-skilled volunteer network engineers, the day-to-day maintenance and monitoring of the network is done by members with varied skill levels; we generally prefer easy-to-maintain solutions over highly customized configurations requiring extensive knowledge and training.
Finally, as we primarily rely on member donations to maintain and expand the network, we generally avoid high-end enterprise-grade hardware or software requiring recurring subscription fees and support contracts to minimize operational expenses. As NYC Mesh continues to grow, we may need to adopt more robust and dynamic routing and load-balancing techniques, and will look to our community to collectively decide on the path forward.
Scaling out the Hub-and-Spoke model
This baseline architecture works great in individual neighborhoods and on relatively linear routes, but with over a thousand nodes connecting to 60+ Hubs with links crisscrossing New York City, some planning and manual intervention is required to ensure stability and speed for all connected members.
In the Bed-Stuy, Bushwick, Ridgewood, and Crown Heights neighborhoods show above in Figure 3, we have over a dozen Hubs serving hundreds of members. Efficient routing and redundancy across multiple wireless links requires further options for route cost between 10 and 100.
In efforts to minimize single points of failure in our network (hubs having only 1 exit route) and provide dedicated backup routes in cases of weather impacting high-frequency links, we deploy redundant links in a "triangle scheme" so that each hub has multiple low-cost routes to exit. To see this deployed, let's remove the nodes from the above photo and focus on the Hubs.
As we can see in Figure 4, most Hubs have 2 or more exit routes so that an outage of an individual link or Hub will not isolate any other Hub. Additional routes leading off Figure 4 allow multiple exits from both Vernon and Hex House, as well as other lower-capacity links through smaller Microhubs and nodes.
In Figure 5, we observe a similar trend as we move southwest towards Prospect Park and Supernode 3 at Industry City.
Load-balancing across varied hardware
NYC Mesh uses a broad range of purchased and donated Ubiquiti and Mikrotik hardware with varying capacity, capabilities, and rain fade resilience, and members are allowed to extend the network at will pursuant to the Network Commons License. Because our OSPF link costs are static and do not automatically increase or decrease based on link quality, limiting ourselves to just two options for link cost will quickly cause issues as the network grows. Here are just a few use cases to consider:
- Avoiding unintentional bridging of Hubs with low-capacity connections as members join and add equipment and links
- Intentional design of secondary and tertiary routes for major Hubs to mitigate rain fade and hardware failure
- Multi-antenna routes with differing performance characteristics (primarily high-capacity 60GHz links with dedicated 5GHz backup hardware)
- Minimizing impacts from misconfigured DIY and new infrastructure installations
To meet these goals, we need to set up custom link costs on backup routes as well as between high-traffic Hubs.
Example: Microhubs between Major Hubs
Our Vernon and Prospect Heights Hubs collectively carry more than 80% of NYC Mesh network traffic in Brooklyn. By design, each Hub's primary exit is through different Supernodes to the public Internet (Vernon through Supernode 10 in Manhattan, and Prospect Heights through Supernode 3 in Industry City). To allow redundancy between their exits, a dedicated 60GHz link (in teal) is deployed between the two, but Vernon and Prospect Heights also have more preferrable secondary links (illustrated further below in Figure 7). This requires the link to have a slightly higher cost (in this case, 15) so that each Hub prefers other backup routes in case of primary exit link outages.
To make matters more complicated, Microhubs in between Vernon and Prospect Heights connect to both Hubs to provide their own redundancy, as illustrated in Figure 6.
Note: nodes and sector coverage have been omitted
To ensure each Microhub prefers the fastest route and they don't bridge Vernon and Prospect Heights by having additive link costs lower than the Vernon <> Prospect Heights 60GHz link, we need to manually set the backup links with higher costs.
Reminder: before setting up secondary links, double-check that the appropriate Bridge Filters are enabled on the local router. A misconfigured or disabled Mesh Bridge Filter will result in 0-cost links between neighboring nodes and Hubs, risking major network congestion or outages.
- 436
- Primary: Prospect Heights via AF60LR PtMP (default 10 on Mesh Bridge)
- Secondary: Vernon via Litebeam 5ac PtMP (manual 40 on OSPF area)
- Hancock 3607
- Primary: Vernon via Litebeam 5ac PtP (default 10 on Mesh Bridge)
- Secondary: Prospect Heights via Litebeam LR PtMP (manual 80 on OSPF area)
- St Marks 219
- Primary: Vernon via AF60LR PtMP (default 10 on Mesh Bridge)
- Seconday: Prospect Heights via Litebeam LR PtMP (manual 80 on OSPF area)
- 540
- Primary: St Marks 219 via LHG60 PtMP (default 10 on Mesh Bridge)
- Secondary: Vernon via Litebeam 5ac PtMP (manual 80 on OSPF area)
Determining link costs
Note that there is no firm methodology or formula for calculating optimal custom link costs in this model, though backup links are generally set between 20 and 80 depending on upstream impacts. Sufficient buffer should be allocated between primary and secondary routes to allow expansion and updates with minimal changes required to upstream OSPF costs or routes.
The NYC Mesh Node Explorer tool generates live and historical mappings of nodes and link costs, allowing us to quickly determine primary exit routes and costs, as well as an outage simulator tool that is extremely helpful in validating configuration of secondary routes.
You can also determine current exit cost of any Mikrotik router running RouterOS v6 with the following command:
[admin@nycmesh-xxxx-core] > routing ospf route print where dst-address =0.0.0.0/0
# DST-ADDRESS STATE COST GATEWAY INTERFACE
0 0.0.0.0/0 ext-1 20 10.70.253.xx bond1.1010
Note: identifying information has been removed from this terminal export
When selecting a custom link cost that may bridge segments of the network, the following factors should be taken into account:
- What is the preferred exit path for the local router?
- This will normally be the link with the highest capacity (60 GHz), and will usually have default cost 10 on the Mesh Bridge to keep configuration simple
- Will the backup link connect to the same Hub as the primary, or a different one?
- When the primary and secondary/backup links connect to the same upstream Hub, it's generally safe to set the backup link to cost 20 without further impacts
- What are the current primary and secondary exit routes & costs for each upstream Hub?
- This gives us an understanding of where traffic will route along each hop of the network
- In the event of a primary link failing on an upstream Hub, will the new bridged link take priority over an existing secondary route?
- Unless the local router is intended to override an existing upstream Hub's secondary/backup route, this defines the minimum cost of the secondary link: the primary link cost + secondary link cost + exit cost after the secondary link should be greater than the existing exit cost at the preferred Hub. This ensures that if the upstream Hub's primary route is interrupted, it will continue to use its existing preferred backup route.
- (Optional) For the local router, is the upstream Hub's secondary exit preferred over the local router's secondary route?
- In some cases, the secondary route of the upstream Hub may have bandwidth constraints or other limitations that make the local router's backup more preferrable in the event that the upstream Hub's primary exit is interrupted. In this case, the total cost of the local backup exit route should be less than the local primary link cost + the upstream Hub's secondary exit cost, but still high enough to not cause the upstream Hub to prefer the new bridged link.
- Testing this scenario can be challenging in production environments without actively disabling preferred links; a speed test from the local router to the second-order upstream backup router is usually sufficient for planning purposes.
Planning for Outages
Ok, to summarize, we've done the following:
- Selected OSPF for simplicity and consistency across the network
- Defined default link costs for primary and WDS links across nodes and Hubs
- Set up bridge filters to ensure OSPF works properly
- Established "triangles" for higher-capacity Microhubs serving multiple members, and adjusted costs for these links to ensure we don't bridge major Hubs
- Created secondary, tertiary, and even quaternary links for Hubs and geographically-advantageous locations to ensure failover exits
To determine route costs to set on primary and backup routes, we need to understand more about the wireless hardware used in the Mesh. The table below details characteristics of the common equipment currently in use on the Mesh.
Brand | Antenna | Band | Advertised Capacity* | Typical Capacity* | Rain Resilience | Preferred Link Distance** |
Ubiquiti | Litebeam Litebeam LR |
5GHz | 225 Mbps+ (40 MHz width) |
75-175 Mbps |
Extremely High | < 3.5km (PtP) < 2km (PtMP) |
Ubiquiti | airFiber 5XHD LTU Long-Range |
5GHz | 425 Mbps+ (80MHz width) |
125-300 Mbps |
Extremely High | < 5km (PtP) < 3km (PtMP) |
Ubiquiti | 24GHz | 750 Mbps (100MHz width) |
750 Mbps | Very High | < 5km (PtP) | |
Ubiquiti | 60GHz | 1Gbps (1080MHz width) |
1Gbps | Medium | < 3.5km (PtP) < 2km (PtMP) |
|
Ubiquiti | airFiber 60 XR | 60GHz | 2.7Gbps (2160MHz width) |
2.7Gbps | Low/Medium | < 5km (PtP) |
Mikrotik | SXTsq 5AC | 5GHz | 200 Mbps+ (40 MHz width) |
75-100 Mbps | Extremely High | < 1.5km (PtP) < 500m (PtMP) |
Mikrotik | LHG 5AC | 5GHz | 200 Mbps+ (40 MHz width) |
75-125 Mbps | Extremely High | < 3.5km (PtP) < 1.5km (PtMP) |
Mikrotik | LHG 60 | 60GHz | 1Gbps (1080MHz width) |
300-600Mbps | Low/Medium | < 1.5km (PtP) < 500m (PtMP) |
Siklu | Etherhaul Kilo 8010 (licensed band) |
70GHz 80GHz |
10Gbps (2160MHz width) |
10Gbps | High | < 5km (PtP) |
* Capacity listed is single-direction speed; Typical Capacity indicates observed performance in New York City
** Preferred Link Distance is a subjective estimate of maximum distance in dense urban areas before performance is significantly degraded
As we can see, decisions on route priority depend on the capacity of individual links, as well link distance (for rain resiliency) and count of hops (for latency) to an internet exit. That's a lot of factors to consider! Let's see what this looks like in the real world.
Determining Primary and Backup Costs
To see how these factors are taken into account when planning for real-world deployments, let's return to Brooklyn.
Note: some additional links and hubs omitted for clarity; listed link speeds are production single-direction actuals; distances between Hubs are not to scale
Figure 7.1 illustrates links between larger Hubs in Brooklyn, with notations indicating deployed hardware, directional link capacity and physical distance. Supernodes in green serve as Internet Exits; all other Hubs are marked in blue. As mentioned earlier in this article, all OSPF link costs are symmetrical to ensure consistent bi-directional traffic flow and ease of configuration.
Because OSPF will always prefer the shortest route to an exit, its easier to work from the outside-in (i.e., starting with the Supernodes adjacent to the Internet Exits and working our way deeper into the network). To begin planning our link costs, let's start by identifying the Public Internet exits:
- Both 713 - Supernode 3, and 1934 Grand St. have 40Gbps fiber uplinks to the Public Internet. While they carry more traffic than indicated in this diagram, there is enough capacity in each of these links to comfortably carry all NYC Mesh traffic through a single Supernode if necessary. SN3 has cost 1 to exit, and Grand has cost 3.
- 1417 - Hex House (Soft Surplus) serves a number of nodes and small Microhubs, and leverages a Wireguard VPN connection over consumer fiber internet to connect to SN3. It also serves as a backup route for Vernon and Saratoga. To ensure it doesn't take too much traffic in normal network conditions, it has a total cost of 26+1=27 to exit through SN3.
With the Public Internet exits and costs identified in red in Figure 7.2, the network will operate mostly ok with default costs of 10 (noted in grey) for all other links... until we experience outages, whether from heavy rain or rare hardware failure.
To optimize the network, we'll need to make some changes to some of our primary link costs.
- Currently, all traffic will exit through SN3. We want to change Vernon, which supports hundreds of members and dozens of Hubs, to prioritize the SIklu link over the AF60XR, increasing rain resiliency and capacity. To accomplish this, we'll need to make 2 changes:
- Decrease the Siklu link to cost 8, for a total exit cost of 8+3=11 via Grand.
- Increase the AF60XR link cost to 11, for a total backup exit cost of 11+1=12 via SN3.
- We can leave the Prospect Heights and President links as-is; these single-hop links have more than enough bandwidth to serve their upstream members. Both have a total exit cost of 10+1=11.
- With these changes in place, Vernon now prioritizes traffic correctly over its highest-capacity link, and has a dedicated high-capacity secondary failover link in case of hardware failure or a Grand outage. Additionally, Saratoga will now also follow Vernon's exit to Grand.
We now have traffic moving more efficiently, and can operate with plenty of overhead for traffic spikes and growth. However, if we experience heavy rainfall or suffer outages, we may still have problems. We'd also like to make sure we don't have to redesign the entire OSPF schema from the ground up very time a new Hub is stood up. We'll work to address that now:
- To prefer higher-capacity links in our secondary routes, let's change the following:
- The Saratoga AF5X link cost to Soft Surplus can be increased to 20, and the PH AF60LR link to Vernon can be increased to 15 to leave some room for growth.
- To ensure PH prefers Vernon as its secondary exit, the President <> PH link can be increased to 30 and will continue to be the secondary exit for President.
- Because Saratoga and Vernon have many Microhubs between them that risk bridging these two larger Hubs (as illustrated in Figure 4), we'll decrease that link cost to 9 to mitigate this risk.
- Since we increased the Saratoga <> Soft Surplus secondary link cost, we'll also want to increase the cost of the Vernon LTU-LR link to Soft Surplus so that Saratoga and Vernon will balance their traffic across their dedicated lower-capacity Soft Surplus links if Vernon gets isolated from both Supernodes.
- Additionally, Soft Surplus and its small number of connected nodes and Microhubs should use its weather-proof fiber VPN link as their primary exit instead of traversing less resilient wireless links.
- To address both needs, we'll increase the Vernon <> Soft Surplus LTU-LR link cost to 28.
Let's see how our network looks in Figure 7.4 now that we've put these changes in place.
We now have efficient routing in place with high resiliency backups for rain as well as high-capacity backups for hardware failures and outages!
If you've made it this far, you should have a good understanding of how to safely manage routes and costs on the Mesh, and be able to safely add new nodes and Hubs armed with a better understanding of traffic flow across the network.
To learn how to configure OSPF routes on our Mikrotik routers, see the Point-to-Point Configuration guide to get started.
Appendix: Brooklyn Hub OSPF Costs
Note: the routing details in this document are accurate as of April 11th, 2024. Before making route changes, check the NYC Mesh Node Explorer tool and discuss in our Slack.
1340 - Saratoga
- This hub serves a large amount of nodes and Microhubs, carrying 1-400Mbps traffic at any given moment.
- Its primary exit is through a 1.95Gbps AF60LR to Vernon, with total exit cost of 9+8+3=20 via Grand.
- This 2.2km 60GHz link occasionally experiences interruptions in heavy rain and snow.
- Its secondary exit is a 200Mbps AF5X to Hex House, with a total exit cost of 20+26+1=47.
- Although this 3.5km 5GHz link does not have enough capacity to consistently carry all local traffic, it is extremely resilient to weather impacts, making it preferrable over a high-frequency, high-capacity antenna.
3461 - Prospect Heights:
- Similar to Saratoga, this hub serves a large amount of nodes, Microhubs and Hubs, carrying 2-500Mbps traffic at any given moment
- Its primary exit is through a 750Mbps AF24 to SN3, with total exit cost of 10+1=11 via SN3.
- While this 3.8km 24GHz link has lower capacity than a similar 60GHz model, it has much better weather resiliency and experiences only a few minutes of downtime per year
- Its secondary exit is a 1.95Gbps AF60LR to Vernon, with total exit cost of 15+8+3=26 via Grand.
- This 3.3km 60GHz link is less resilient to weather, and is intended only as a failover in case the AF24 link goes down due to hardware malfunction or SN3 outage.
- Its tertiary exit is a 200Mbps LTU-LR to President - 5151, with total exit cost of 30+10+1=41 via SN3.
- Similar to Saratoga's secondary route, this 1.4km 5GHz link is extremely resilient to weather impacts.
5151- President
- While this hub serves 15-20 members, normally only carrying ~25-100Mbps in traffic, it's location and height make it worthwhile to include in our analysis as it serves as a backup to multiple Hubs.
- Its primary exit is through a 175Mbps Litebeam LR to SN3, with total exit cost of 10+1=11.
- This 2.8km 5GHz link is extremely resilient to weather impacts, and given the smaller footprint and bandwidth requirements of this Hub, is preferred over 60GHz hardware.
- Its secondary exit is a 200Mbps LTU-LR to Prospect Heights, with total exit cost of 30+10+1=41 via SN3.
- It also has a tertiary 5GHz 100Mbps exit through 1635 - Park Slope (not shown in this diagram), and is secondary exit for that Hub.
5916 - Vernon
- The Vernon Hub is our largest and most heavily trafficked in Brooklyn, serving a very large number of nodes, dozens of Microhubs, and many Hubs as a primary exit to the Public Internet. Its location and height advantage over nearby neighborhoods make it a critical backbone of the Mesh. It typically carries 600-1200Mbps of traffic.
- Its primary exit is through a 10Gbps SIklu EtherHaul to Grand, with total exit cost of 8+3=11.
- This 4.4km licensed 70GHz link is fairly resilient to rain and snow, but does occasionally experience service degradation and interruptions in heavy precipitation.
- Its secondary exit is through a 3.2Gbps AF60XR link to SN3, with total exit cost of 11+1=12.
- Similar to Prospect Heights' secondary, this 6.9km 60GHz link, the longest in NYC Mesh production use, has poor weather resilience, and is intended only as a failover in case the Siklu link goes down due to hardware malfunction or Grand outage.
- Its tertiary exit is the 1.95Gbps AF60LR to Prospect Heights, with total exit cost of 15+10+1=26 via SN3.
- Similar to the secondary exit, this 60GHz 3.3km link is only intended as a bidirectional failover in case of multiple hardware failures and/or Supernode outages.
- Its quaternary exit is a 200Mbps LTU-LR to Hex House, with total exit cost of 28+26+1=55 via SN3.
- Although this 2km 5GHz link does not have enough capacity to consistently carry all local traffic, similar to Saratoga's secondary link, it is extremely resilient to weather impacts and provides an exit in cases of especially severe weather interrupting all other higher-capacity links.