Data Wrangler at Automattic: 2011 – present
M.S., Computer Science, University of Colorado: 2009 – 2011
Engineer at Broadcom and Teradyne (Hardware, Software, Verification): 2000-2009
B.S., Electrical Engineering, Cornell University 2000
I am interested in using statistical techniques to extract useful information from large data sets. I’ve applied machine learning to natural language processing tasks as well as in other domains. My interests are in implementing and scaling up these systems to solve real problems.
- Automattic.com: WordPress.com Data Team: July 2011-Present
Helped to grow multi-disciplinary team from 5 to 18 engineers, data scientists, and designers.
Took on diverse roles in a flexible, mostly flat team structure including individual contributor, project lead, and co-lead of the team.
Scaled 40+ node Elasticsearch cluster to support 50+ million search requests per day across 2 billion documents.
Led Elasticsearch development/maintenance efforts across the company enabling dozens of developers to build fast, reliable search systems for many different applications.
Built and shipped multiple front end features used by millions of users every day: Related Posts, WordPress.com Notifications, WordPress.com Stats. Participated in the design of these features from conception to launch.
Helped to drive the direction of the Data Team as it scaled out data systems on many different technologies: Elasticsearch, Hadoop, HDFS, Kafka, Hive, Impala, Spark. Projects regularly required scaling to handle many terabytes of data and millions of events per day.
Contributed to data science efforts on engagement, content recommendation, search, and ad hoc data analysis.
- J.D Power and Associates: Web Intelligence: June 2010-Dec 2010
Built integrated NLP research infrastructure (text tokenization, part of speech tagging with HMM, and
RRM dependency parser) for science team researching automated sentiment analysis of online media.
Researched sequence labeling techniques for determining sentiment, entity mentions, and meronymy in text documents. This infrastruture was then also used for my Master’s Thesis on relation extraction.
- Broadcom: High Def Video Chip Development: 2005-2007
Responsible for planning, implementing, and maintaining C++ and Perl verification and development environments
for the DDR2 Memory System and MIPS processors in a HD-DVD/Blu-Ray System on a Chip. Worked with a large
team to design, verify, and bring to production multiple versions of the chip.
- Teradyne: System Platform Development: 2000-2005
Designed and verified various large and small FPGA projects, taking designs from early concepts
through to completion. Heavily involved in designing and implementing distributed, embedded DSP processing system
which was patented in 2002. Lead technical designer on multiple projects. Developed hardware systems and
extensive software for the development of the systems.
- Master’s Thesis: Relation Extraction on the J.D. Power and Associates Sentiment Corpus: Aug 2010 – April 2011
Using existing and novel techniques to extract relations between entities in the JDPA Sentiment Corpus (consisting of blogs and other social media). I used support vector machines to classify relations with both binary vectors and tree kernels to encode the features. Additionally I built an ensemble classification scheme to combine predicted relations across multiple sentences and improve the entity relation extraction.
- Data Mining Project: Examining News Bias with Topic Modeling: Feb 2011 – April 2011
Performed topic modeling across multiple news sources to explore how much insight can be gained into news bias by examining the variation in the topics across different news sources. We focused on the protests and revolts in Egypt and Libya from Jan 2011 through April 2011 and how the events were discussed by a number of news organizations from around the world.
Details: Final Paper
- Accelerating System Performance Analysis with Machine Learning: Aug 2009 – August 2010
Learning from an expert human’s analysis of a computer system’s performance which metrics
are most likely to lead to the root cause of performance anomalies.
An Examination of the Local Dynamics of Computer Performance (starting on page 16)
Waveform Shape Recognition for Vertical Profiling with an Ensemble of Support Vector Machines
- Modeling of Hybrid Multi-Robot Systems: Fall 2009
Comparing discrete-time simulations of a robotic system with probabilistic state-space models of the same system
in order to find better methods of understanding the dynamics.
Cooperative Lifting of an Object Using a Hybrid Robotic System
- Evolutionary Computation Research: Jan 2008 – July 2008
Research into self-adaptive genetic algorithms where the parameters of the evolution are also optimized by
running a separate genetic algorithm.
- G I Brown. Relation Extraction on the JD Power and Associates Sentiment Corpus.
Department of Computer Science, University of Colorado at Boulder, 2011.
- G I Brown, An Error Analysis of Relation Detection in Social Media Documents.
Student Session, Proceedings of the Association for Computational Linguistics 49th annual meeting (ACL-2011),
Portland, OR, 2011
- G I Brown, “An Examination of the Local Dynamics of Computer Performance,”
in Projects in Chaotic Dynamics, University of Colorado Department of Computer Science Technical Report CU-CS 1060-10 (www.cs.colorado.edu/publications), 2010.
- US Patent #6966019 2002 Instrument Initiated Communication for Automatic Test Equipment (Teradyne)