Systems and methods are provided for classifying text based on language using one or more computer servers and storage devices. A computer-implemented method includes receiving a training set of elements, each element in the training set being assigned to one of a plurality of categories and having one of a plurality of content profiles associated therewith; receiving a population set of elements, each element in the population set having one of the plurality of content profiles associated therewith; and calculating using at least one of a stacked regression algorithm, a bias formula algorithm, a noise elimination algorithm, and an ensemble method consisting of a plurality of algorithmic methods the results of which are averaged, based on the content profiles associated with and the categories assigned to elements in the training set and the content profiles associated with the elements of the population set, a distribution of elements of the population set over the categories.
In a computer assisted clustering method, a clustering space is generated from fixed basis partitiions that embed the entire space of all possible clusterings. A lower dimensional clustering space is created from the space of all possible clusterings by isometrically embedding the space of all possible clusterings in a lower dimensional Euclidean space. This lower dimensional space is then sampled based on the number of documents in the corpus. Partitions are then developed based on the samples that tessellate the space. Finally, using clusterings representative of these tessellations, a two-dimensional representation for users to explore is created.
Anonymous pretesting items for subsequent presentation to participants in a group enable an instructor to validate responses and revise the items accordingly. ... The present invention facilitates anonymous pretesting of items in classrooms (and/or other similar settings) to which the item author has no direct access or knowledge. In some enbodiments, pretesting is performed by software used by the instructor/author in his or her own classroom for other tasks. In various implementations, the software shares information with a central clearninghouse anonymously. The central clearinghouse then automatically matches students in the instructor's class with "relevant" students from other classes -- e.g., students that a statistical algorithm predicts will have approximately the same understanding, and will give approximately the same answers, as the instructor's class. ...
Representative embodiments of a method for grouping participants in an activity include the steps of: (i) defining a grouping policy; (ii) storing, in a database, participant records that include a participant identifer, a characteristic associated With the participant, and/or an identifier for a participant’s handheld device; (iii) defining groupings based on the policy and characteristics of the participants relating to the policy and to the activity; and (iv) communicating the groupings to the handheld devices to establish the groups.
A method for selecting clusterings to classify a predetermined data set of numerical data comprises five steps. First, a plurality of known clustering methods are applied, one at a time, to the data set to generate clusterings for each method. Second, a metric space of clusterings is generated using a metric that measures the similarity between two clusterings. Third, the metric space is projected to a lower dimensional representation useful for visualization. Fourth, a “local cluster ensemble” method generates a clustering for each point in the lower dimensional space. Fifth, an animated visualization method uses the output of the local cluster ensemble method to display the lower dimensional space and to allow a user to move around and explore the space of clustering.
A method of computerized content analysis that gives “approximately unbiased and statistically consistent estimates” of a distribution of elements of structured, unstructured, and partially structured source data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be equal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques.