Load balancing on JunOS

Post Author:

CacheFly Team

Categories:

Date Posted:

November 14, 2018

Follow Us:

[et_pb_section admin_label=”section”] [et_pb_row admin_label=”row”] [et_pb_column type=”4_4″] [et_pb_text admin_label=”Text”]

One of the fun things that tends to happen when I’m coaching someone through an Anycast deployment is getting past the first proof of concept, which usually looks something like: “a west coast webserver” and “east coast webserver” each announcing a /32 via BGP.

At this point I tend to recommend our approach that we’ve used at CacheFly over the last 15 years or so. We do *not* rely on the machines themselves to handle the BGP injection. We’ve always had a designated machine or machines in each POP that performs health-checks and handles the network magic. The tip I’m going to share, however, works great even if you did decide to use the machine(s) themselves as the injector, but you’d still want *something* health-checking the services locally and calling the up/down scripts, though you could get away with just calling them at boot/shutdown (not recommended!).

Using separate ‘health check machines’ also works well if you have some sort of stateful load balancing equipment (eww) that you want to use behind ECMP.

I’m going to leave out the nuts and bolts of the anycast config itself, and of configuring a health-checking software like Keepalived and focus on the ECMP concept, which once you get the hang of you will find can be useful and easy to deploy. Obviously you’ve already configured your FIB to load-balance per-packet and if you’re on a QFX you’re ideally using enhanced-hash-key ecmp-resilient-hash. Here we go:

For the example I’m going to use, I’ll say we have 3 servers and want to weight them 2:2:1. Let’s say their actually IP’s are on 172.16.1.0/24 and they are:

  • 172.16.1.101
  • 172.16.1.102
  • 172.16.1.103

The basic config looks like configuring a *static* route for your service VIP, with *fake* next-hops, we’ll use 192.168.1.1 as the VIP and 10.1.1.0/24 for our fake next-hops:

routing-options {    
   static {      
     route 192.168.1.1/32 {
        next-hop [ 10.1.1.1 10.1.1.2 10.1.1.3 10.1.1.4 10.1.1.5 ];
        resolve;        
     }     
   }
}

The important part here is resolve. 10.1.1.0/24 is not reachable, and *should not be*.. Use some address space that you’ll never use for any other reason.

(Please note Cisco and other vendors automatically resolves recursive routes. JunOS, by default, will only let you point a static route at a directly connected route — the resolve keyword is what enables recursive lookups on JunOS).

Next you would have a BGP session (or OSPF or IS-IS..I prefer to keep everything simple and keep it all in BGP, but you just need *something* to send routes to your ECMP device), where you have an import policy that accepts 10.1.1.0/24 upto /32’s.

On the machines themselves, you want this in your quagga BGP config:

router bgp 65536
 redistribute kernel route-map LB
 neighbor 172.16.1.1 remote-as 65536
 neighbor 172.16.1.1 prefix-list LB out
!
ip prefix-list LB seq 5 permit 10.1.1.0/24 ge 32
!
route-map LB permit 10
 match ip address prefix-list LB
!
route-map LB deny 20
!

Hopefully you see where we’re going by now.. The final step is to have your “up” and “down” scripts configured on your healthchecker, which just need to do this:

server1 up script:

/sbin/ip route add 10.1.1.1/32 via 172.16.1.101
/sbin/ip route add 10.1.1.2/32 via 172.16.1.101

server 2 up script:

/sbin/ip route add 10.1.1.3/32 via 172.16.1.102
/sbin/ip route add 10.1.1.4/32 via 172.16.1.102

server 3 up script:

/sbin/ip route add 10.1.1.5/32 via 172.16.1.103

And there you have it, now on your Switch/Router you should see:

admin@device> show route 192.168.1.1
inet.0: 699161 destinations, 4883639 routes (699076 active, 0 holddown, 1316436 hidden)
+ = Active Route, - = Last Active, * = Both
192.168.1.1/32 *[Static/5] 
                      to 172.16.1.101 via irb.100
                      to 172.16.1.101 via irb.100
                    > to 172.16.1.102 via irb.100
                      to 172.16.1.102 via irb.100
                      to 172.16.1.103 via irb.100

Voila!

your down scripts would then be the opposite, eg on server1:

/sbin/ip route delete 10.1.1.1/32 via 172.16.1.101
/sbin/ip route delete 10.1.1.2/32 via 172.16.1.101

Which leaves you with:

inet.0: 699161 destinations, 4883639 routes (699076 active, 0 holddown, 1316436 hidden)
+ = Active Route, - = Last Active, * = Both
192.168.1.1/32 *[Static/5] 
                      to 172.16.1.102 via irb.100
                      to 172.16.1.102 via irb.100
                    > to 172.16.1.103 via irb.100

There you have it. You’ve now got a weight-able, health-checkable ECMP load balancing solution.

[/et_pb_text] [/et_pb_column] [/et_pb_row] [/et_pb_section]

Product Updates

Explore our latest updates and enhancements for an unmatched CDN experience.

Book a Demo

Discover the CacheFly difference in a brief discussion, getting answers quickly, while also reviewing customization needs and special service requests.

Free Developer Account

Unlock CacheFly’s unparalleled performance, security, and scalability by signing up for a free all-access developer account today.

CacheFly in the News

Learn About

Work at CacheFly

We’re positioned to scale and want to work with people who are excited about making the internet run faster and reach farther. Ready for your next big adventure?