Assessment of Evaluation in Information Retrieval - Tefko Saracevic Historical Approach to IR Evaluation.

Saracevic's Definition of Evaluation is surveying execution or estimation of a framework, handle, technique, item, or arrangement.

Evaluation Requirements A framework A model A standard or criteria Objectives of framework Measures Recall and accuracy A measuring instrument Judgments by examiners/clients Methodology Procedures, i.e.. for TREC

Levels of Evaluation Engineering level Hardware and Software. Input level Contents of framework –coverage. Preparing level Questions in regards to the way sources of info are handled: evaluation of calculations, strategies and methodologies.

Levels of Evaluation cont. Yield level Interactions with the framework and acquired yield. Utilize and client level Applications utilized for given errands. Social level Effects on research, profitability and basic leadership. Eco-productive level Economic effectiveness inquiries to be resolved at each level of investigation.

Two more classes of assessment. End client execution and utilize Meyer & Ruiz, 1990; others condensed in Dalrymple & Roderer, 1994. Markets, items, and administrations from data industry. Rapp et al., 1990. These assessments show up frequently in exchange magazines, for example, Online, Online Review, Searcher, and so on .

Output and client and Use level assessments Fenichel (1981) Borgman (1989) Saracevic, Kantor, Chamis &Trivison (1990) Haynes et al. (1990) Fidel (1991) Spink (1995)

Processing level: Approaches "Toy Collections" Cranfeld (Cleverdon, Mills & Keens, 1966) SMART (Salton 1971, 1989) TREC (Harmon, 1995)

Studies directed on the social level: Evaluating effect of IR territory particular frameworks. Effect of MEDLINE on clinical basic leadership (Lindberg et al., 1993)

Criteria in IR Evaluation Relevance as center criteria, Kent et. al. 1955. criteria, for example, utility and hunt length did not stick. Cranfeld, SMART, TREC – all rotated around the marvel of pertinence. Keeping assessment out of designing level by ramifications of utilization. Importance is an intricate human process – not of a double sort. Reliant on conditions

Output and User and Use level assessments Employ a variety of criteria. identified with utility, achievement, culmination, worth, fulfillment, esteem, productivity, cost and so forth . . More accentuation on connection.

Market, Business, Industry Evaluations Similar to client utilize level TQM: Total Quality Movement Cost-adequacy Debate over importance is disengaged in IR.

Isolation of studies inside levels of beginning. Calculations Users and Uses Market items/administrations Social Impacts

Process level measures of assessment Precision Ratio of significant things recovered to add up to recovered things or, likelihood that a recovered thing is applicable. Review Ratio of applicable things recovered to all accessible important things in a specific record or, the likelihood given that a thing recovered will be pertinent.

Measures: User Use level. Semantic differentials Likert scales Which measures to utilize? How do measures think about? How would they impact the outcomes? It's obvious, Su, 1992

Measuring Instruments Mainly, individuals, are the instruments that decide significance of recovered things. Who are the judges? What impacts their judgments? How would they impact the outcomes?

Methodological issues encompassing thoughts of legitimacy and unwavering quality. Accumulation – How are things chosen? Demands – How are they produced? Seeking – How is it led? Comes about - How are they acquired? Investigation – What examinations are made? Understanding/Generalization - What are the conclusions? It is safe to say that they are justified on premise of results? How generalizable are the discoveries?

Evaluation outside of customary IR, i.e. Computerized Libraries and the Internet. Assessment is restricted to programming and designing levels. Evaluated all alone level. Numerous applications are generally welcomed, in any case, on most yield, client and utilize levels these applications are observed to disappoint, flighty, inefficient, costly, minor questionable and difficult to utilize!

Don't through the child out with the shower water! Dervin and Nilan, 1986 Article Swung to the next end of the pendulum and called for paradigmatic move. From framework focused to client focused assessments. Both client and framework focused methodologies are required.

Keep it sensible! Conceivable arrangement: The mix of all levels of assessment for a complete "genuine to life" examination.