Posts Tagged ‘Outliers’

Detecting outliers in US census data

How can outliers in large data sets be automatically identified, for example to detect fraud or data quality issues? In this example, we analyze anonymized US census data, and identify clear outliers. We also explain how the same technique can be applied to other types of data.

Analyzed data:

The following anonymized US census variables were made available by the US government:

  • Age
  • Worker class
  • Industry code
  • Occupation
  • Education level
  • Enrollment in an educational institution
  • Marital status
  • Race
  • Hispanic origin
  • Sex
  • Member of a labor union
  • Reason for unemployment
  • Capital gains & losses
  • Dividends
  • Tax filer status
  • Country of birth
  • Number of children in the household
  • …several others…

  Read more…

Categories: Uncategorized Tags: ,