

-Statistics by De-
DeTistics
About De
De is a data scientist with a background in social research methodology and expertise in advanced quantitative methodology, data mining, and text mining. Thus far De has worked mostly with educational data and game log data. Prior to becoming involved in research, De was a teacher for a number of years, and before that a database administrator.
De also knits. A gallary of some of De's best sweaters can be found here.
Links to open-source versions of De's publications can be found below.
Publications

Statistics
Advanced statistical topics in educational research focus on measurement issues such as dimension reduction, estimation, validity, reliability, and generalizability. Methods include factor analysis, structural equation modeling, hierarchical linear modeling, and item response theory.
De's publications in this area include papers on using exploratory factor analysis to identify the different kinds of instructional techniques used to teach science to English language learners and on using mediation studies, linear regression, and analysis of variance to examine differences in in-game behavior.

Data Mining
Analyzing click-stream log data from games and simulations is very difficult, but data mining techniques can be used to reduce this mass of low-level process data to a smaller set of substantively meaningful features.
De's publications in this area include papers about logging game data to support analyses, using cluster analysis to identify the strategies players use to solve levels in the game, using sequence mining to identify changes in strategy use, using classification to identify different learning trajectory types, and applications of the data mining results to make targeted modifications to a game.

Text Mining
Extracting semantic meaning from text is one of the most difficult issues in natural language processing. The rule-based system, SemScape, uses grammatical information to identify the relationships between words in a given free text document and exports that information in the form of TextGraphs. A second rule set is then used to extract semantic information from those graphs.
De's publications in this area include papers about SemScape, as well as applications of SemScape to extract InfoBox information from unstructured Wikipedia text, generate ontologies from free text, and automatically score short essays for content.