High-Availability Cloud Based Voice of IP Voice Communications
Written by Ahsan Baig, Mike Carvalho, Jim Willows, and Philip Bockrath
In this article we describe our experience from the implementation of a high-availability multi Cloud-based VoIP system to AC Transit, the third-largest public bus system in California. The AC Transit service area includes portions of Alameda and Contra Costa Counties, serving 13 cities and unincorporated areas from San Pablo down to South Fremont. AC Transit also provides commuter service across the Bay to San Francisco, San Mateo and Santa Clara counties. The Service consists of more than 151 bus routes throughout a 364-square mile service area serving 1.5 million people.
Real-time voice communications between Operations Control Center (OCC), Bus Operators, and Field Supervisors are critical components of maintaining service reliability and practicing safety. Historically, the voice communications at AC Transit were achieved by leveraging the traditional Land Mobile Radio (LMR) system. At the critical decision making time, the options were to invest millions of dollars into the replacement and upgrade of the LMR system, including the long-haul communication hardware infrastructure and end-user radio equipment, or deploy the newly developed Voice Over IP (VoIP) software based technology using the commercial cellular 5G network. As the computer industry moves towards Internet Protocol (IP) based voice connectivity, the IP protocols provide device interoperability, system reliability, spectrum efficiency, wide coverage areas at a cost-effective price. Based on extensive market research and assessment of technology maturity, AC Transit concluded that the VoIP communications option was more stable, reliable, and inexpensive than traditional analog radio communication solutions.
AC Transit designed its cloud infrastructure utilizing industry best practices. The two privately-owned data center sites are set up to host the VoIP technology infrastructure for high availability and resiliency purposes, with full redundancy in connectivity and failover capability. This technology stack is the foundation of our private VoIP Cloud infrastructure. Additionally, the corporate office hosts the Radio over IP (RoIP) equipment and provides the core switch for Local Area Networks (LAN) devices. One of the Bus Division hosts the UHF Land Mobile Radio equipment, which is set up as a secondary backup. The Microsoft AZURE Cloud environment is also utilized to host the third-party Apps for remote connectivity and emergency communications.
Scalable, secure, and resilient network infrastructure is key to the successful operation of this VoIP system. Each layer of our network technology has a precise role; system reliability, adequate bandwidth, Quality of Service, and same-day hardware and software service contracts allow for continuous, reliable operation. Each Local Area Network (LAN) provides connectivity for our servers, dispatch consoles, mobile devices, and user workstations. Multiple LANs are configured within the VoIP system, and each LAN is comprised of core switches, distribution switches, access network switches, wiring plants, and fiber optic backbones. These LANs are located at our data centers, corporate office, bus divisions, and even on our buses. Each network device must be able to provide Quality of Service (QoS) technology. QoS configurations provide traffic prioritization and resource reservation controls by elevating VoIP network traffic above all other IP based network traffic. Speed and reliability are essential to providing reliable VoIP communications. The Wide Area Network (WAN) provides connectivity between our data centers, corporate offices, and bus divisions. This high-speed backbone provides continuous, reliable communications for all our VoIP technology. This technology includes point-to-point circuits, leased ethernet circuits, point-to-point fiber optic connections, and virtual private networks (VPN) over the Internet.
Figure 1 - AC Transit Wide Area VoIP Communications Solution
Hybrid Cloud Implementation
Hybrid Cloud setup, using Public and Private Clouds, enables resiliency and a cost-effective way of managing resources, as these are the most critical parts of our VoIP implementation. Various backup technologies and systems are in place to provide reliable operations in the event of hardware failure, network outages, or software issues. A dual data center configuration is used to provide maximum system uptime. Both data centers have all the server and network infrastructure to independently operate the VoIP system. The data centers utilize an N+2 redundancy scheme for power conditioning, power generating, UPS battery backup, and equipment cooling. The data centers are geographically located with different power grids and experience different natural disasters.
AC Transit operates a highly resilient routed network consisting of an EIGRP core with OSPF branches for vendor-neutral compatibility and several static and BGP-based networks redistributed into the core at strategically chosen points. This fully autonomous system self-corrects any single point of failure for our most critical traffic. Automatic mechanisms, such as IP Service Level Agreement (SLA) with object tracking and customized administrative distances for static routes, are employed to preconfigure the routing system for any condition and avoid the need for human interaction to correct for a link or device failure. This extends to the VoIP remote data centers, which span two geographically distinct locations and utilize dual BGP circuits that do not directly integrate into the organization's core EIGRP and OSPF protocols. Despite those technical complications, the automatic routing mechanisms in place perfectly facilitate failover of these networks from AC Transit's HQ just as easily and smoothly as the failover mechanisms built directly into EIGRP.
End to End Quality of Service (QoS)
To ensure reliable voice service even when the network is heavily saturated, the organization utilizes a single comprehensive QoS marking, queueing, and shaping policy similar to our routing policy. The tool we use to accomplish this is Differentiated Service Code Point (DSCP) tags. Initially set up and installed as part of the organization's original VoIP phone implementation more than ten years ago, the QoS marking and queueing scheme has been updated over the years as capacity improved and required throughput increased. AC Transit proudly achieves latency values of about 50ms or less for most internal voice calls and jitter less than 20ms, which is extremely adequate for high-quality voice and video calls.
We can achieve excellent voice quality in terms of latency and jitter by employing a similar policy for Quality of Service. Despite the difficulties of integrating remote data centers that utilize different routing protocols and different link technologies, AC Transit operates a single autonomous system that is highly available for all voice communications, thanks to our strict adherence to these important policies.
System Performance - Where the Rubber Meets the Road
With the design of any safety-critical communication solution, many factors come into play including security, reliability, redundancy, flexibility, scalability, and authentication. Like many other Public Transit agencies, AC Transit deals with around-the-clock incidents from mundane to life-threatening daily. Therefore, when critical events transpire, it is necessary to ensure that reliable voice communication is in place for situational awareness that can meet demand even when failures occur.
Figure 2 - AC Transit Service Area
AC Transit's VoIP communication system has been in production use for almost two years. Thorough network design, implementation, and validation testing are critical in ensuring end-users experience high-reliability communications. Throughout the design, evaluation, and deployment segments of the process, all links in the network were evaluated for latency, jitter, and packet loss throughout the process. This included fixed-end networks, broadband links, and on-vehicle solutions.
The AC Transit service area was evaluated using ITU G.107 standards-based testing using 0.25 mi x 0.25 mi grids. Data collected was evaluated using both spatial and temporal tools to ensure that QoS DSCP fields were adhered to for expedited forwarding of critical packet streams.
A multi Cloud-based VoIP implementation, based on geographically diverse data centers, provides redundancy and high availability for the network. The Quality of Service design elements and engineering the tight time-delay metrics for voice quality are obtained through detailed network traffic engineering. A unique perspective in this implementation is the complete paradigm shift for a Public Transit agency, whose focus is in providing safe and reliable mobility services, instead of spending sacrce technical resources in developing and maintining the private LMR systems.
This article was edited by Shafi Khadem
For a downloadable copy of the August 2021 eNewsletter which includes this article, please visit the IEEE Smart Cities Resource Center.