Spice Telephony

From SpiceCSM

Jump to: navigation, search

Contents

Overview

Spice Telephony is an attempt to create a modular and truly distributed callcenter platform. The goal is to allow callcenter resources (agents, PRI/VoIP lines, workstations) to operate together regardless of physical location so long as a network link is available, but fallback to an orphaned mode if that link is severed. The idea is that a callcenter can have multiple locations, each with local agents and some satellite remote agents. Each of these locations joins a cluster which effectively makes all the different locations act as one big callcenter. And if one of the nodes goes down or one of the network links drops, the cluster can operate with the remaining nodes and the orphaned node can operate just like a small isolated callcenter until it's able to rejoin the cluster.

Additionally, scaling a single very large callcenter would be as simple as just adding another server to the local cluster and having a proportion of the agent/calls directed there instead.

Spice Telephony was designed to be purposely modular so that you can swap out whole subsystems without a lot of trouble and just keep the "core" of what you need; the queues and the skill-based routing. All the inputs/outputs to the system should be modular.

If you want to see how to install and play around with the current implementation; see the Spice Telephony Install Guide. There's also the Spice Telephony User Guide which reviews some of the topics of this document in more detail.

Terminology

Here are some of the terms used in the system that might be unfamiliar:

  • Brand - A Company or Department associated with a call (for callcenters supporting multiple companies or departments it's useful and often essential to track this)
  • Wrapup - A period at the end of a call after the caller has disconnected when the agent can perform some additional procedures (log case notes, send an email, etc).
  • Warm Transfer (also known as an attended, supervised or consult transfer) - Put the caller on hold, call a third party (not another agent in the callcenter) and if the third party agrees, bridge them with caller and drop the agent to Wrapup.

Status

Done (sorta):

  • Basic FreeSWITCH™ integration for inbound and outbound calls
  • Basic ability for agent to set state (released<->idle, ending wrapup at end of call)
  • Single node call queueing/routing/delivery
  • AJAX agent interface
  • Management/Configuration interface
  • Distribution/Clustering
  • Support for agent to agent and agent to queue transfers
  • Call data records(CDRs)/agent state tracking
  • Call recording/archiving
  • Email and Voicemail support

Not done:

  • Support for warm transfers
  • Other media types (chat, video, fax?)
  • Everything else?

Current thoughts on a 1.0 release are January '10 after spending December doing QA and testing.

If you have feature ideas please see Spice Telephony Ideas.

Implementation

The implementation is in Erlang, a language designed to build scalable, fault tolerant systems. The VoIP is supplied via FreeSWITCH, an open source softswitch. Agents can either use a ruby/tk client to manage their status or an AJAX web application. VoIP can be directed at a softphone, a hardphone or any landline/cellphone, as the agent prefers.

Spice Telephony system diagram
Spice Telephony system diagram
Supervisor Tree
Supervisor Tree

The image on the right is an attempt to illustrate the system's design. Each bubble mostly represents an erlang process and the whole diagram is a single erlang 'node'. Please note that this image isn't completely accurate or complete, but it provides the basic overview. The color-coding indicates the following:

  • Green: This process exists on all nodes in a cluster, but one of them is the "master".
  • Blue: This process appears once in the cluster, but it's location is the node where it's most used.
  • Orange: This process exists only on the node where the resource it relates to exists.
  • Black: This process exists on all nodes but an instance on one node is unrelated to one on another node.

When a node or a subset of nodes leaves the cluster, a new master is elected for any green processes and any missing blue processes are spawned if they're needed. When the cluster re-unifies, one of the masters gets demoted and any duplicate blue processes migrate their data to the other one.

We make heavy use of Erlang's OTP layer, which gives us the ability to do hot code updates with zero downtime and other production friendly features.

Modules

Media Managers

Media managers are a way to translate any particular media that a callcenter can handle (voice, chat, email, video, etc) into a generic 'call' that the callcenter platform can route. For example, our FreeSWITCH media manager simply uses mod_erlang_event to detect incoming calls and direct FreeSWITCH to transfer it to the correct agent. We also plan to implement a generic email media manager and a XMPP/Jabber media manager. Other, custom, media managers could be developed by an organization for their own use.

Media managers can deal with media where the ring can be 'inband' or 'out of band'. For example, if you're handling an email, you don't really need it to ring to a softphone, but you may prefer it to, but if you're handling a voice call, it must ring to a phone. So a voicecall has an 'out of band' ring in the sense that the ring doesn't travel through our callcenter platform but instead via SIP or over the PSTN. Other call types that don't need to ring to a phone can use either ring mechanism, depending on client configuration.

Agent Manager

The agent manager is responsible for handling the agent logins/state changes/etc. It also acts as the central registry for all agents in the cluster. The slave agent manager on each node knows about it's local agents but queries the master for any agents it doesn't know about. The agent manager has child processes that handle listening for new TCP or HTTP client connections and spawning a new agent process on valid login.

Agents

Agents are the representation of a callcenter agent. They really consist of 2 components: the agent finite state machine and the agent connection. The finite state machine ensures that agents always make valid state changes and does some additional generic agent stuff. The agent connection is a layer to abstract the transport the agent is using to communicate with the server, the current options are a HTTP client or a TCP 'fat client'.

Queue Manager

The queue manager is similar to the agent manager except that it manages a directory of queues instead of a directory of agents.

Queues

Queues contain 'calls' of any type and simply have calls added/removed. Queues are also "priority queues" with weights so that queues themselves can be prioritized for proper delivery order. That means that within a queue some elements can have a higher or lower priority than others, which affects their position in queue. So for example, emails could have a lower priority so that voice calls are handled first.

Queues also have a 'recipe' that dictates changes that can be performed on calls if specific conditions are met, for example hold times.
A recipe is a list of changes or actions to take on a call while it is in queue, and when to make those changes.
A recipe in english might read something like:

  • At 10 seconds, make the call more important.
  • At 20 seconds, remove the '_node' limitation from the call so agents off site can take it.
  • Every 40 seconds, apologize to the caller for them being in queue so long.
  • At 200 seconds, and if there are no agents available, send the call to voicemail.

See Spice_Telephony_User_Guide#Recipes for more info.

Skills

Spice telephony relies heavily on skill based routing. Unlike app_queue for asterisk where an agent must be a member of a queue to be considered for where to offer a call, spice telephony uses skills to determine that. Queues are weighted so that you can bias which queues are polled first for calls awaiting delivery.

Skills can come from a few places, for calls they can come in via the media manager (for example, the FreeSWITCH dialplan) (TODO: document example) and the queue the call enters (if the call leaves the queue and enters another one, it loses the skills added by the first queue). For agents, the agent account contains skills and they can also be applied by the profile the agent currently is in. So you can route calls primarily by profile and for agents with some exceptional skills (bilingual, some specific training, etc) can have per-agent skills. Switching agents from one profile to another would clear out any skills applied by the old profile and add the new ones the new profile provides, along with their per-agent skills.

See Spice_Telephony_User_Guide#Skill_Management for more info.

Dispatch Manager

The dispatch manager effectively manages the dispatcher worker process pool, it creates/destroys local dispatcher processes in response to agent availability.

Dispatcher

A dispatcher is a process that scans all the queues for a call that a dispatcher from the current node isn't already bound to, binds to it and tries to offer it to one of it's local agents. If any matches are found they're prioritized in such a way as that the agents with the minimum required skills are preferred over more versatile agents. This list is compared to lists submitted by other dispatchers on other nodes and the system selects the best match by looking at skills, idle time and path costs. If no agents on the current node match, the dispatcher unbinds itself and tries to find another call.

Calls

Calls are the process designated to maintain state for a particular call. When a new call is received or generated for outbound, the appropriate media manager starts a process whose job it is to handle the call from then on. The call handles updates from the media (call hangup from FreeSWITCH™ for example) and updates from the routing engine (entering queue, bridging to an agent, etc). Each call is a separate process to ensure fault tolerance; if any single call experiences a fatal error and the process exits, the rest of the calls on that media manager are unaffected.

Lifecycle of an inbound call

  • 'Call' enters a media manager destined for a queue (decided by DNIS, email address, etc)
  • A call tuple is created, initial skills are applied and it's inserted into the requested queue
  • An 'operator' process is spawned which will handle applying recipes, etc
  • A dispatcher binds to the call
  • The operator requests all its dispatchers to return a list of eligible agents
  • The list of agents is sorted according to skills, idle time and path cost
  • The list of agents is iterated through by the operator until one of them can be set ringing or the list is exhausted
  • The agent starts ringing
  • The agent picks up and is now 'oncall' with whatever media type this call is
  • The agent deals with the call in the usual way
  • The agent ends the call and goes to wrapup
  • The agent ends wrapup and a billing record is written