Creating a Tag Cloud

Tags tagged as   Code: C#, CSS, Web
Scale and bound font sizes in a tag cloud using a natural-log curve fitting.

Summary

This article discusses an algorithm for creating a tag cloud with weightings in a controlled range.

This algorithm is simple and easy to use, but it is only suitable for well-distributed clouds — those that have relatively normal distributions. For a more comprehensive algorithm that does not require this assumption, look at the following article:


When creating a tag cloud, the entries are weighted by increasing their font size, so that the most frequently occuring tags have the largest font size, while the most infrequent tags are the smallest. Keeping the font sizes within a target range regardless of the relative and changing weights of the tags requires a well-calibrated algorithm.

A stub for my actual algorithm is presented, so that you can fill in your own data gathering and calibration as desired.

Articles and downloads sponsored by:
Thanks! Amazon commissions help me pay for textbooks.

Algorithm

First we need to set up some variables to hold our intermediate calculations. In practice, you can get rid of some of these variables by computing them inline, but the solution will be a lot clearer if we can do it one calculation at a time.

1 /* OVERALL CALCULATIONS */ 2 int numOfTags = 0; // total distinct tags 3 int numOfTagInstances = 0; // total instances of all tags 4 int targetFontSize = 11; // mean font size 5 float avgTagInstances = 0F; // average instances of each tag 6 7 /* CALCULATIONS ABOUT CURRENT TAG */ 8 int fontSize = 0; // font size 9 float instanceDeviation = 0F; // deviation from avg instances 10 int fontSizeDeviation = 0; // deviation from mean font size

Next, we need to compute the overall averages, so we can tell how much a given tag deviates from the mean. You'll need to compute the total number of distinct tags and the total number of instances of tags based on your data. The instances are the occurances of the tag in context. For example, if three articles are tagged with 'ASP.NET', then ASP.NET has three instances.

1 numOfTags = /* compute this value */; 2 numOfTagInstances = /* compute this value */; 3 4 avgTagInstances = ((float)numOfTagInstances) / numOfTags;

Now, we can go through our tags, and process each one. We'll compute the number of instances of each tag, then figure out how this deviates from the mean number of instances per tag. Then we'll use a bit of curve fitting to find a bounded font-size to represent our tag's popularity.

1 foreach (/* for each tag you want to print */) 2 { 3 // how many times this tag appears 4 int instancesOfThisTag = /* compute this value */; 5 6 // start with the font size set to the target size 7 fontSize = avgFontSize; 8 9 // find the deviation from the mean 10 instanceDeviation = instancesOfThisTag - avgTagInstances; 11 12 // scale the font-size deviation based on the deviation from the 13 // average number of instances 14 // * the instance number deviation is fitted to the 15 // * curve of the natural logarithm to give it 16 // * solid upper and lower bounds, so that font 17 // * sizes will not deviate unreasonably far from the 18 // * target mean font size. this gives us a tight and 19 // * predictable range of font sizes within which our 20 // * values will fall, and prevents extremely popular or 21 // * unpopular tags from having extreme font size 22 // * deviations. 23 fontSizeDeviation = (int) Math.Round(2.5 * 24 Math.Log( 25 1 + Math.Abs(instanceDeviation) 26 ) 27 ); 28 29 // since we had to use the absolute value of the deviation above, 30 // we re-introduce the sign here, so the scaling is either smaller 31 // or larger based on whether the number of instances is smaller 32 // or larger than the average number of instances 33 if (instanceDeviation > 0) 34 fontSize += fontSizeDeviation; 35 if (instanceDeviation < 0) 36 fontSize -= fontSizeDeviation; 37 38 /* Now, print out the tag using the calculated font size */ 39 40 }

The use of the natural logarithm as a basis for our font-size deviation gives us a good range of font sizes. With a target font size of 11px, I generally see calculated font-sizes in the range of 9px to 14px resulting from this algorithm, with 8px and 15px being the absolute extremes.

Comments & Feedback


There are no comments on this entry.
Leave this field blank:
Comment on this Entry
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Please link to this article in your source code comments if you use this content.

Article Info

Posted January 29, 2007
Viewed 2189 times

User Rating:

Share

Add to DiggAdd to del.icio.usAdd to FURLAdd to RedditAdd to YahooAdd to BlinklistAdd to GoogleAdd to ma.gnoliaAdd to ShadowsAdd to Technorati
Coffee Counter
Current Coffee:
 Peet's Malawi Songwe River

Current Count:
Akxl Coffee Meter

Create Your Own »

The Real-Time Coffee Meter is a free Website App from Akxl Labs. Text-only and badge versions available.