Scale and bound font sizes in a tag cloud using a natural-log curve fitting.
This article discusses an algorithm for creating a tag cloud with weightings in a controlled range.
This algorithm is simple and easy to use, but it is only suitable for well-distributed clouds — those that
have relatively normal distributions. For a more comprehensive algorithm that does not require this assumption,
look at the following article:
When creating a tag cloud, the entries are weighted by increasing their font size, so that the most frequently
occuring tags have the largest font size, while the most infrequent tags are the smallest. Keeping the
font sizes within a target range regardless of the relative and changing weights of the tags requires a
well-calibrated algorithm.
A stub for my actual algorithm is presented, so that you can fill in your own data gathering and calibration as desired.
Articles and downloads sponsored by:
Thanks! Amazon commissions help me pay for textbooks.
First we need to set up some variables to hold our intermediate calculations. In practice, you can get rid of some of these variables
by computing them inline, but the solution will be a lot clearer if we can do it one calculation at a time.
1 /* OVERALL CALCULATIONS */
2 int numOfTags = 0; // total distinct tags
3 int numOfTagInstances = 0; // total instances of all tags
4 int targetFontSize = 11; // mean font size
5 float avgTagInstances = 0F; // average instances of each tag
6
7 /* CALCULATIONS ABOUT CURRENT TAG */
8 int fontSize = 0; // font size
9 float instanceDeviation = 0F; // deviation from avg instances
10 int fontSizeDeviation = 0; // deviation from mean font size
Next, we need to compute the overall averages, so we can tell how much a given tag deviates from the mean. You'll need to compute the
total number of distinct tags and the total number of instances of tags based on your data. The instances are the occurances of the
tag in context. For example, if three articles are tagged with 'ASP.NET', then ASP.NET has three instances.
1 numOfTags = /* compute this value */;
2 numOfTagInstances = /* compute this value */;
3
4 avgTagInstances = ((float)numOfTagInstances) / numOfTags;
Now, we can go through our tags, and process each one. We'll compute the number of instances of each tag, then figure out how
this deviates from the mean number of instances per tag. Then we'll use a bit of curve fitting to find a bounded font-size to
represent our tag's popularity.
1 foreach (/* for each tag you want to print */)
2 {
3 // how many times this tag appears
4 int instancesOfThisTag = /* compute this value */;
5
6 // start with the font size set to the target size
7 fontSize = avgFontSize;
8
9 // find the deviation from the mean
10 instanceDeviation = instancesOfThisTag - avgTagInstances;
11
12 // scale the font-size deviation based on the deviation from the
13 // average number of instances
14 // * the instance number deviation is fitted to the
15 // * curve of the natural logarithm to give it
16 // * solid upper and lower bounds, so that font
17 // * sizes will not deviate unreasonably far from the
18 // * target mean font size. this gives us a tight and
19 // * predictable range of font sizes within which our
20 // * values will fall, and prevents extremely popular or
21 // * unpopular tags from having extreme font size
22 // * deviations.
23 fontSizeDeviation = (int) Math.Round(2.5 *
24 Math.Log(
25 1 + Math.Abs(instanceDeviation)
26 )
27 );
28
29 // since we had to use the absolute value of the deviation above,
30 // we re-introduce the sign here, so the scaling is either smaller
31 // or larger based on whether the number of instances is smaller
32 // or larger than the average number of instances
33 if (instanceDeviation > 0)
34 fontSize += fontSizeDeviation;
35 if (instanceDeviation < 0)
36 fontSize -= fontSizeDeviation;
37
38 /* Now, print out the tag using the calculated font size */
39
40 }
The use of the natural logarithm as a basis for our font-size deviation gives us a good range of font sizes. With a target font size of 11px,
I generally see calculated font-sizes in the range of 9px to 14px resulting from this algorithm, with 8px and 15px being the absolute extremes.
This
work is licensed under a
Creative Commons Attribution 3.0 United States License.
Please link to this article in your source code comments if you use this content.