DNSMASQ is both a DNS and DHCP server that is quick and efficient to run on Linux systems and is likely already running on your Linux box. If you’re in need of a quick DHCP server to run your environment to serve multiple DHCP scopes for different subnets in your VLAN, of which we all know the best practice is subnet == VLAN == Broadcast domain, then DNSMASQ is your go to guy and I prefer it over the ISC DHCPD server. This quick tutorial will go over the basics of how to get this setup and running and assumes you’re not going to utilize the DNS service.

Create a directory for your DHCP leases file:

sudo mkdir /opt/dnsmasq

Setup dnsmasq.conf:

#Disable the DNS server
#Setup the server to be your authoritative DHCP server
#Set the DHCP server to hand addresses sequentially
#Enable more detailed logging for DHCP
#Set your DHCP leases file location
#Create different dhcp scopes for each of the three simulated subnets here, using tags for ID
#Format is: dhcp-range=<your_tag_here>,<start_of_scope>,<end_of_scope>,<subnet_mask>,<lease_time>
#Setup different options for each of the unique subnets, since default gateways will be different
#The format for this is: dhcp-options=<your_tags_here>,<option>,<option_value> - 3 is router

Once this is complete, enable your DHCP service to start automatically. You should also check your systems firewall/IPTABLES service(s) to ensure you have created rules to allow UDP traffic over port 67 and port 68, or you can just flush your IPTABLES and/or disable your firewall, your choice, this isn't a security blog so I'll leave the choice to you, the person who knows their environment better.

First, allow me to say these indeed to do exist, the RJ-45 based 10GBaseT SFP+ modules, a company called Methode Electronics manufactures both a SFP+ based module and a X2-RJ-45; however, we’ll only really talk about why a RJ-45 based 10GBaseT SFP+ transceiver still isn’t practical for lengths beyond 30m, with present technology.

The issues
The number one issue we have, with the current technology today in 2017, is the number of transceivers required for distances greater than 30m using 10GBaseT SFP+ modules. The incredible number of transistors will consume an enormous amount of energy per port and the heat generated by the operation of such modules will be monumental, to say the least. Also, with distances greater than 30m, the amount of heat generated needs to be pulled away from the circuitry and will require large heat sinks, which will increase the bulk of the switch itself or careful consideration of airflow characteristics around the SFP+ ports, including higher speed and higher volume fans (which in turn would also consume more energy themselves) further increases the power demands of a switch utilizing SFP+ modules for 10GBaseT SFP+ modules. X2 modules are indeed out there, but X2 is a different form factor to begin with and I won’t be discussing this here.

Why do I reference 30 meters?
Why do I reference distances greater than 30 meters (30m)? Two reasons: 1. When people want to look at Cat6a/7 for long haul connectivity (to somewhat come close to the distance of multi-mode fiber optics on OM4 fiber cables) 2. Current technology at the time of this writing actually permits us to engineer a 10GBaseT SFP+ module for distances of up to 30m using about 2.5W of energy per port. Once again, please look up the company Methode Electronics and their white paper on 10GBaseT SFP+ optics, its pretty cool stuff.

Who wants this?
Now, what audience cares about utilizing such stuff as copper for distances at 100m? In the enterprise market you’ll likely never see anyone think about using copper for spanning distances close to 100m, especially in the Data Center where the copper cross-connect is disappearing in favor of 10/25/40/50/100G fiber cross-connects, because the cost of these optics are dropping fast. When I say 40G here, I am also assuming the use of Cisco 40G BiDi transceivers because they allow you to utilize existing LC based fiber infrastructure. However, service providers are still interested in utilizing copper back haul connections for distances for at 100 meters because, if the SFP+ modules are cheap enough along with the cost of laying the Copper, they’ll want to utilize this. You’ll likely see such things as connections at last mile (rather under a mile, a lot) or between two offices or central offices. Once again, price usually always wins; thus, time will tell. So, now you know, why you’re just not seeing mass produced 10GBaseT SFP+ modules on the market.

If you’re looking to use command line variables for scripting stuff you have some predefined variables in the NX-OS environment to use and you can also create your own. For now, I’ll just show you how to use the most common, the switches hostname. In some environments you’ll have to save the output of a show tech file and later on upload it via SCP. However, if you’re doing this to 2 or more switches, you’ll need unique file names to make your life easier. Instead of going to each one, you can just use the variable SWITCHNAME in the file. So, if you’re using a script or something like cluster-ssh, this makes your job easier.

sh tech all > bootflash:///shtech-$(SWITCHNAME)

I realize there is still some confusion regarding Cisco Nexus FEX as it relates to ToR connected FEX, which is a Cisco Nexus 2K FEX with a Cisco Nexus 5K/7K/9K as a parent switch, and the FEX you find in UCS, which we can refer to as “Blade-FEX”. I am going to outline what ToR (Top of Rack) FEX in this blog post, not Blade-FEX, to help bring some clarity around this still confusing terminology. This is also not meant to bring any additional ambiguity, but it is true you can use certain Cisco Nexus 22XX ToR-FEX and “parent” them to a Cisco UCS Fabric Interconnect; however, I would not classify this as Blade-FEX or ToR-FEX, I’d like to coin it with the term “Fabric-FEX”, you owe me $1.00 every time you use this, send it via paypal :). Thus, moving forward, we’re going to refer to a FEX which parents to a Cisco Nexus switch as a ToR-FEX.

Cisco Nexus FEX works thanks to the Cisco pioneered 802.1BR, click here for more information. Now, you don’t have to worry about configuring the gory details of what is essentially VN-TAG because this is all handled with a few simple commands to get your FEX up and running; however, this is just here do you know how FEX works to communicate with the parent switch underneath the sheets.

The logical representation of FEX is broken down like this:

  • Logical Interfaces (LIF) – This is simple, its the Eth1xx/1/X representation on the switch
  • Network Interfaces (NIF) – These are the physical uplinks connecting the FEX to parent, carrying the VN-TAG
  • Virtual Interface (VIF) – This is the logical interface which correlates, in software, to the physical host interface. We we wil discuss this in a minute about why this makes FEX capable of full swap of a failed FEX without reconfiguring the host ports
  • Host Interface (HIF) – These are the physical ports on the FEX which you connect your hosts to. The parent switch assigned each HIF a unique VN-TAG ID, which is roughly correlated to the above Virtual Interface (VIF) assignment.

Here is some output to take a peek at, taken from a Cisco Nexus 9332PQ switch with 2348TQ and 2348UPQ FEX attached:

slot:36, fab_if:160001f4, p_ind:f4010016, p_numelem:1
dev_inst:0, nif_no:16, hif_no:40, nif_ind:160001f4, hif_ind:1f670a00
Eth104/1/42 0x1f670a40 Down Po501 Po501 NoConf

Take notice, this is Logical port: Eth104/1/42 and there is a plethora of information regarding the port, including the HIF numer and the hif_ind. I haven’t referenced anything with Cisco as of yet, but I would believe the HIF no is the unique number assigned to the port, perhaps the VIF, and the HIF_IND may be an index ID, but I’ll investigate later. For now, just take notice that: Eth[101-199]/1/[1-48] is the LIF, which is attached to a VIF, which correlates to the HIF on the FEX. Because FEX attaches the configuration to a VIF, which is also correlated to the FEX ID, you can have your FEX member, say FEX 104, fail completely and all you need to do is just replace the failed FEX, cable it the same way and when the FEX image is downloaded it’ll reboot and continue operation without the need to rebuild the configurations.

Now, you MUST be diligent in understanding the valid UPLINK topology you can configure your ToR-FEX for, in relation to your parent switch. Always review the configuration guide for your specific model of FEX and parent switch to obtain the valid topology. In my scenarios with the Cisco Nexus 9K switches I do a single-homed, host vPC port-channel uplink topology because we can’t do a more elaborate e-vPC design with the 9K switches and our hosts will be attached with port-channels in an active-active scenario.

Finally, the configuration is simple; however, some Cisco documentation is confusing because the wording in some documents states the UPLINK port-channel is LACP enabled; thus, you would assume you configure your UPLINK as an active LACP member. This is wrong, in fact, the best method, at least from my experience with the 9K switches, is to create the port-channel you’ll be using for the UPLINK, no-shut the interface and nothing more, then move into the physical interfaces that’ll be part of this port-channel, no shut the interfaces and just assign them to the port-channel as static mode. Then, move back into the port-channel configuration mode and build your configuration. Below is the basic configuration you need to get your FEX attached to your 9K switch:

interface po500
no shut
interface eth1/21-24
channel-group 500
no shut
int po500
switchport mode fex-fabric
fex associate
no shut -

A note about setting Jumbo frames on those FEX ports. The FEX host ports will assume the maximum MTU based on the UPLINK port-channels MTU assignment. In our environments we aim to have jumbo frames end-to-end and leave it up to the specific host/OS/application to decide on its optimal packet size. Thus, if you set your MTU on the UPLINK port-channel to 2000, your MTU on will be 2000 on your host interface ports on the FEX.

There has been some slight confusion and ambiguity around the “single-connection” configuration statement provided by Cisco switches and routers, including SAN MDS switches. As of this writing, Cisco Nexus 9000 NXOS switches on 7.0.3.I5.1 code do not support single-connection in their tacacs host configuration; however, certain MDS switches do. In either case, if you do find yourself wondering here for the answer, let me elaborate for you.

The purpose of single-connection is to multiplex all of your TACACS authentication requests using a single TCP oriented connection from the switch to the TACACS server. Using tac_plus, an open source TACACS server, you can absolutely set the single-connection bit from say, a Cisco 9706 MDS switch; however, upon packet analysis of any TACACS authentication requests you may discover the single-connection bit is set to 0.

Refer to draft-grant-tacacs-02 and scroll to the FLAGS section for an explanation of where you will, and should, see the single-connection bit set in the TACACS flag. Basically, you’ll only ever find the bit set in the initial setup of the connection so both the TACACS server and the client agree on single-connection TCP. Thus, instead of each and every TACACS request coming through as a unique TCP connection (essentially having to use multiple sockets, sockets being the 4-tuple of SRC IP, DST IP, SRC port, and DST port) the TACACS query and response messages are just carried over the single TCP connection.

If your system supports this, its worth attempting to see if it works as it can save some resources; however, your mileage may vary.

If you have upgraded your Cisco Nexus switches to code level 7.0(3)I2(1) or higher and had flowcontrol enabled on an interface, you’ll likely find you’re not able to do a “no flowcontrol receive on” because the command was deprecated. Current recommendation is to default the switch configuration but I have a solution you can implement one switch at-a-time with a single reload to fix this issue:

copy run startup-config
copy startup-config <tftp: | scp:>
sh run | sed 's/flowcontrol receive on//g' >> bootflash:///no-flow-control-startup-config
copy bootflash:///no-flow-control-startup-config startup-config
! Do not save the running-config to startup-config - just reload one switch at-a-time

So, how do I put this? Oh yeah, I spend money on my bikes and spare absolutely no expense considering your entire life depends on the operation of just two wheels and some really tiny brakes with a lot of stopping power; thus, cheap isn’t my game, I pay to play. I am no stranger to spending money with MotoMummy either, CapitalOne and my bank account can vouch for that. Please, continue to read on because I am not just someone who is upset because their part arrived in three days, instead of two… Read the rest of this entry »

I have seen a lot of people get confused about the length of their double banjo bolt required for the Brembo RCS master cylinder.

When using Goodridge stainless brake lines you’ll want to purchase either of the following to ensure exact fit:

  • Goodridge: 993-03-31SD or 993-03-31SDBK short 30mm version
  • Brembo: 06.2228.22 or 06.2228.21
  • Pro-bolt: TIBANJOD10FR
  • Proti-bolt: M10L21-OT04
  • LuckyBike: 92-800-TI
  • Washers (3 total): ID: 10.5MM – OD:14-15mm – Thickness 1mm

Spiegler imports their bolts and makes only one size with 35mm total length with 1.5-2.0mm in thread length using 1MM thick washers. This setup, using Goodridge lines, will not work because not only is the bolt length too long but the thread length is too long. Technically, you could use 2mm thick washers to reduce this down; however, no guarantee if the banjo fittings will line up to distribute fluid properly or the bolt actually threads correctly.

As a word of caution do not cut or otherwise modify the banjo bolt to fit the Brembo master cylinder! The Spiegler bolt costs around $15-$18  and the other bolts are the same price, but you can find the Brembo for $6.00 at some local distributor or a local Ducati or Aprillia dealership. While you can say it is unlikely something will happen with your brakes with using this cut bolt, should something happen you’ll find there are fingers pointing at you saying “Not an authorized modification, if the bolt didn’t fit should have not used or modified it to fit”. I don’t know about you, but when I am at the track or going down the highway, I want to know the one bolt which seals my brake lines at my $300 Brembo master cylinder is the proper fit and is not modified to fit because it was too long.

Just think about it, you paid $300, or more if you chose a different Brembo master cylinder model, and you’re willing to hack up a $15 bolt to make it fit? Buy the Brembo or Goodridge double banjo bolts to fit properly. Even better, the Pro-bolt is much nicer, offered with a Diamond Like Coating too, and is already pre-drilled to safety wire the bolt to ensure it never vibrates loose.

See my video on this very specific topology, what I’ve encountered, and the solution I found to work for me:

So, you’ve surely seen some interesting tidbits in the previous section, things you haven’t noticed from other configurations on the Internet. I will outline why these are present in this configuration based on the failure scenario I present below:

Complete and total loss of spine connections on a single leaf switch – First I’ll outline the ONLY reasons why a single leaf switch would lose all of its spine uplinks:

  1. Total and absolute failure of the entire leaf switch
  2. The 40GbE GEM card has failed, but the rest of the switch remains operational
  3. An isolated ASIC failure affecting only the GEM module
  4. Someone falls through a single cable tray in your data center, taking out all the connections you placed in a single tray
  5. Total and complete failure of all 40GbE QSFP+ modules, at the same time
  6. Total loss of power to either the leaf switch or to all spine switches
  7. All three line cards, in three different spine switches, at the same time, suffer the same failure
  8. Someone reloaded the spine switches at the same time
  9. Someone made a configuration change and hosed your environment

OK, now, lets make one thing clear: NO one, and I mean no one, can prevent any issue with starts with “Someone”, you can’t fix stupid. If you lose power to both of your 9396PX power supplies or to the 3+ PSUs in the 9508 spine switches, I think your problem is much larger than you care to believe. Lets see, we now have just 5 scenarios left.

If your leaf switch just dies, well, you know. Down to four! Yes, a GEM card can fail, I’ve seen it, but this isn’t common and is usually related to an issue which will down the entire switch anyway, but we’ll keep that in our hat. Failure of all the connected QSFP+ modules at the same time? I’ll call BS on this, if all of those QSFP+ modules have failed, your switch is on the train towards absolute failure anyways.

Isolated ASIC failure? So uncommon I feel stupid mentioning it. All three line cards in the spine failing at the same time? Yeah, right. So, in all we’re looking to circumvent a failure in the event of a GEM card failure which doesn’t also mean your switch is dead, being the only real valid reason; however, please note, I am only providing this as proof of concept and I don’t think anyone should allow their environment to operate in a degraded state. If your environments operating status isimportant to you, perhaps a different choice of leaf switch for greater redundancy, a cold or warm backup switch, or at least have 24x7x4 Cisco Smartnet.

When you have a leaf switch suffering from a failure of all the spine uplinks, your best course of action, on a vPC enabled VTEP, is to down the VPC itself on the single leaf switch experiencing the failure. This is where the tracking objects against the IP route and the tracking list which groups them for use within the event manager come to use. Once all the links have gone down, using the boolean AND, by the removal of the BGP host address in the routing table, the event manager applet named “spine down” initiates and shuts down the vPC, loopback0, and the NVE interface, respectively.

When all the links return to operation, there is a 12 second delay, configured for our environment to allow for the BGP peers to reach the established state, and then the next event manager applet named “spine up” initiates, basically just “un-shutting” the interfaces in the exact same order. The NVE interface configuration for the source-interface hold-down-timer, brings the NVE interface UP, but keeps the loopback0 interface down long enough to ensure EVPN updates have been received and the vPC port-channels come to full UP/UP status. If this didn’t happen, and the loopback0 and port-channels come up way too soon before the NVE interface, we’ll blackhole traffic from the hosts towards the fabric. If the NVE and loopback0 interface come up too long before the port-channels, you’ll black hole traffic from the network-to-access direction; thus, timing is critical and will vary per environment so testing is required.

A lot of stuff, right? This is all done to prevent the source interface of the NVE VTEP device coming up before the port-channels towards end hosts come up, to prevent the VTEP from advertising itself into the EVPN database and black holing INBOUND traffic.

You might be thinking: Why not just create a L3 link and form an OSPF adjacency between the two switches to allow the failed switch to continue to receive EVPN updates and prevent blackholing? Well, here are my reasons:

  1. Switchport density and cost per port – If it costs you $30,000 for a single switch of 48 10GbE ports, not including smartnet or professional services, you’re over $600/port, and you and I both know you’re not just going to use ONE link in the Underlay, you’ll use at least two. Really expensive fix.
  2. Suboptimal routing – Lets be real here, your traffic will now take an additional hop because your switch is on the way out
  3. Confusing information in EVPN database for next-hop reachability. – Because the switch with the failed spine uplinks still have a path and receiving EVPN updates, you’ll see it show up as a route-distinguisher in the database, creating confusion
  4. It doesn’t serve appropriate justice to a compromised switch – Come on, the switch has failed, while not completely, it is probably toast and should be downed to trigger immediate resolution of the issue, instead of using bubble gum to plug a leak in your infrastructure. The best solution is to bring down the vPC member completely, force an absolute failover to the remaining operational switch, prevent suboptimal routing, and prevent confusion in troubleshooting.

I can’t stress this enough: Engineering anything other just failing this non-border vPC enabled leaf switch, in the event it is the only switch without all, at least, 3 spine connections, is an attempt at either trying to design a fix for stupid or you’re far too focused on why your leaf switch has failed and ignoring the power outage in your entire data center because you lost main power and someone forgot to put diesel in the generator tanks. Part 3 will include more EVPN goodness, stay tuned!