Generalized Scatter Plots

Generalized Telephone Service Scatter Plot without Overlap. X-axis shows duration in seconds, Y-axis shows the charges in dollar, the color represents the number of participants. Original data represented in upper left corner, overlap is stepwise reduced (left-right) and distortion is stepwise increased (up-down).

Scatter Plots are one of the most powerful and most widely used techniques for visual data exploration. A well-known problem is that scatter plots often have a high degree of overlap, which may occlude a significant portion of the data values shown. In this paper, we propose generalized scatter plots, which allow the visualization of large amounts of data to fit entirely into the display window without overlap. We discuss two variants: binned scatter plots and distorted scatter plots.

The basic idea is to allow the analyst to optimize the degree of overlap, distortion, and binning to generate the best possible view. To allow an effective usage, we provide the capability to zoom smoothly between the traditional and our generalized scatter plots. We identify an optimization function which takes overlap and distortion of the visualization into account. We evaluate the generalized scatter plots according to this optimization function, and show that there usually exists a optimal compromise between overlap and distortion. Our generalized scatter plots have been applied successfully to a number of real-world IT services applications, such as servers performance monitoring, telephone service usage analysis, and financial data, demonstrating the benefits of the generalized scatter plots over traditional ones.

Visualizing population related data, like household income, is often challenging, as the population density varies over regions. Dense areas suffer from high overplotting of points, which decreases the visual effectiveness of a traditional visualization (lower figure). On the other hand low-density regions waste space of the visualization, which could be used for dense areas.