<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Overheard In Providence &#187; data</title>
	<atom:link href="http://www.overheardinprovidence.com/category/data/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.overheardinprovidence.com</link>
	<description>A blog by EERac</description>
	<lastBuildDate>Thu, 03 Jun 2010 01:19:41 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Distribution requirements</title>
		<link>http://www.overheardinprovidence.com/2007/08/23/distribution-requirements/</link>
		<comments>http://www.overheardinprovidence.com/2007/08/23/distribution-requirements/#comments</comments>
		<pubDate>Thu, 23 Aug 2007 19:38:21 +0000</pubDate>
		<dc:creator>eerac</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[math]]></category>

		<guid isPermaLink="false">http://www.overheardinprovidence.com/2007/08/23/distribution-requirements/</guid>
		<description><![CDATA[Steven Dubner wrote a post today highlighting the bursty nature of the linescore for the Rangers recent 30 to 3 victory over the Orioles. Knowing that the Rangers scored 30 runs, Steven says he would have predicted an inning-by-inning score that looked something like 4 3 1 0 5 6 3 5 3, but the [...]]]></description>
			<content:encoded><![CDATA[<p>Steven Dubner wrote a <a href="http://freakonomics.blogs.nytimes.com/2007/08/23/on-the-randomness-or-lack-thereof-of-a-baseball-linescore/#comments" onclick="javascript:urchinTracker ('/outbound/article/freakonomics.blogs.nytimes.com');">post</a> today highlighting the bursty nature of the linescore for the Rangers recent <a href="http://sports.yahoo.com/mlb/recap;_ylt=ArfxQbwKnQB_RQ8ykjsR40sRvLYF?gid=270822201" onclick="javascript:urchinTracker ('/outbound/article/sports.yahoo.com');">30 to 3</a> victory over the Orioles. Knowing that the Rangers scored 30 runs, Steven says he would have predicted an inning-by-inning score that looked something like 4 3 1 0 5 6 3 5 3, but the real score was 0 0 0 5 0 9 0 10 6.</p>
<p>Steven&#8217;s prediction was definitely off the mark. In baseball, even if you know the total number of runs, you still don&#8217;t know the total number of hits (or similarly, men on  base). If you have a very even run distribution over the 9 innings, a lot more people probably got on base, since you usually need 2 or 3 players on base before the first run is scored. This makes getting only a few runs each inning very unlikely. </p>
<p>Still, you can&#8217;t take this reasoning too far as a linescore that looks like 0 0 0 0 30 0 0 0 0 is also very suspicious. Specifically, it only happen 9 ways (the 30 runs could be scored in each inning). In contrast, Steven&#8217;s number could appear in all kinds of orderings, it&#8217;s just that the numbers themselves weren&#8217;t realistically chosen.</p>
<p>Obviously it&#8217;s extremely unlikely that anyone would guess the true linescore for a 30 to 3 game. The interesting question is this: Given a bunch of linescores, can you tell if they are real or fake? As Steven points out, when asks to fake a long sequence of coin flips, people tend to <a href="http://freakonomics.blogs.nytimes.com/2005/08/20/what-do-the-kansas-city-royals-and-my-ipod-have-in-common/" onclick="javascript:urchinTracker ('/outbound/article/freakonomics.blogs.nytimes.com');">severely underestimate</a> the number of times they should have many heads or tails appear in a row. Similarly, I&#8217;ve noticed that when people randomly put dots on a sheet of paper they tend to spread the marks more evenly than if they were truly random. As a result, it is often possible to detect falsified data, as human-generated numbers tend to deviate dramatically from the desired distribution.</p>
<p>By far most interesting things I&#8217;ve ever seen on  this subject <a href="http://mathworld.wolfram.com/BenfordsLaw.html" onclick="javascript:urchinTracker ('/outbound/article/mathworld.wolfram.com');">Benford&#8217;s Law</a>.  In the 1930&#8217;s, a scientist named Frank Benford observed that the first digit in all sorts of real world measurements tended to obey a very specific distribution .<br />
Specifically, he saw that number 1 tends to appear as the first digit about 30 percent of the time, and the number 2 18 percent. If you were forging your income taxes you&#8217;d probably be tempted to make up a bunch of random deductions.  If you don&#8217;t want to get caught, however, you&#8217;d better make sure the number 1 appears with the correct frequency. This sounds far-fetched, but Benford&#8217;s law has actually been used to <a href="http://www.nctm.org/resources/content.aspx?id=7678" onclick="javascript:urchinTracker ('/outbound/article/www.nctm.org');">identify fraudulent returns</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.overheardinprovidence.com/2007/08/23/distribution-requirements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

