CS 6750 Quiz 3

50 flashcards covering representations, GOMS, evaluation methods, and statistics for HCI

What You'll Learn

Free flashcards for Human-Computer Interaction Quiz 3 topics: representations, affordances, constraints, learnability principles, GOMS models, cognitive task analysis, usability evaluation methods, and statistical hypothesis testing. Ideal for graduate HCI students and UX researchers.

Key Topics

Representations: four criteria, mental models, and the hiker/wolves problems
Affordances, signifiers, mapping, perceived vs. actual affordance
Constraints (physical, cultural, semantic, logical) and discoverability
Learnability principles (Dix et al.) and design trade-offs
GOMS model variants: KLM, CMN-GOMS, NGOMSL
Cognitive task analysis vs. GOMS: processor vs. predictor models
Slips vs. mistakes and the Norman action cycle
Evaluation types: qualitative, empirical, predictive
Statistical tests: chi-squared, t-test, ANOVA, binomial
Hypothesis testing, Type I errors, data types, and study design

Looking for more human computer interaction resources? Visit the Explore page to browse related decks or use the Create Your Own Deck flow to customize this set.

How to study this deck

Start with a quick skim of the questions, then launch study mode to flip cards until you can answer each prompt without hesitation. Revisit tricky cards using shuffle or reverse order, and schedule a follow-up review within 48 hours to reinforce retention.

Preview: CS 6750 Quiz 3

Question

What are the four criteria for a GOOD REPRESENTATION?

Answer

1. Makes relationships explicit 2. Brings objects and relationships together 3. Excludes extraneous details 4. Exposes natural constraints

Question

What are the five LEARNABILITY PRINCIPLES (Dix et al.)?

Answer

1. Predictability 2. Synthesizability 3. Familiarity 4. Generalizability 5. Consistency ⚠️ These are NOT the same as the four criteria for good representations.

Question

What is the difference between an AFFORDANCE and a SIGNIFIER?

Answer

An affordance is an inherent property of an object that determines how it can be used (e.g., a door bar physically opens a latch). A signifier is a cue ADDED to help the user's perceived affordance match the actual affordance (e.g., a 'Push' sign on a door). Designers can't add affordances — they can only add signifiers.

Question

What is the difference between AFFORDANCE and MAPPING?

Answer

Affordance = the design hints at HOW to use the interface (e.g., a draggable slider looks draggable). Mapping = the design makes clear WHAT EFFECT using it will have (e.g., a color slider that fades from white to black shows what dragging it will do). A design can have good affordance but poor mapping, and vice versa.

Question

What is PERCEIVED AFFORDANCE vs. ACTUAL AFFORDANCE? Give an example of a conflict.

Answer

Actual affordance = what an object inherently does. Perceived affordance = what a user thinks it will do. Conflict example: A door with hinges that push open but a pull-style handle — users perceive it should be pulled, but the actual affordance is for pushing. A 'Push' sign is a signifier to resolve this conflict.

Question

What is DISCOVERABILITY and why does it matter?

Answer

Discoverability is the degree to which users can figure out what actions are possible just by looking at the interface — without reading documentation. Example: PowerPoint's toolbars let you browse and recognize available functions. The Mac screenshot shortcut (Cmd+Shift+4) is NOT discoverable — you'd never know it exists unless told.

Question

What is the tension between DISCOVERABILITY and SIMPLICITY?

Answer

Discoverability says functions should be visible so users can find them. Simplicity says don't clutter the interface with too much information. Too many visible options (like an overloaded toolbar) can actually make things harder to find. Good design walks the line between the two.

Question

What are the FOUR TYPES OF CONSTRAINTS? Give an example of each.

Answer

1. Physical — a 3-prong plug can only be inserted one way 2. Cultural — facing forward on escalators, forming a line 3. Semantic — a rearview mirror must reflect from behind (inherent to its meaning) 4. Logical — at the end of furniture assembly, one screw and one hole remain — logically they go together

Question

What is TOLERANCE / RECOVERABILITY and why does the Visual Studio Ctrl+Y example matter?

Answer

Tolerance means users should be able to undo mistakes and experiment freely. Visual Studio uses Ctrl+Y to DELETE a line (consistent with legacy WordStar), not redo — so users who press Ctrl+Y out of habit don't just fail to redo, they lose their entire redo history. This violates tolerance because the cost of the error is catastrophic and unrecoverable.

Question

What is PERCEPTIBILITY and what's a classic example of a violation?

Answer

Perceptibility = the user's ability to perceive the current state of the system. Classic violation: a ceiling fan with pull chains — neither chain indicates how many speed settings exist or what setting the fan is currently on. You just pull and wait.

Question

What is CONSISTENCY and what's the core trade-off illustrated by Visual Studio?

Answer

Consistency means using conventions that match what users already know from other interfaces. Trade-off: Visual Studio kept Ctrl+Y as 'delete line' to be consistent with legacy WordStar/Visual Basic 6, but violated the modern convention of Ctrl+Y = redo. You can't always be consistent with everything — sometimes conventions conflict.

Question

What is the difference between FLEXIBILITY and EQUITY?

Answer

Flexibility = support multiple ways of doing the same thing (e.g., Ctrl+X and right-click Cut) to accommodate expert and novice users. Equity = all users get the same experience and benefits, even if delivered differently. Example: Password complexity requirements are EQUITABLE — they extend the same level of security to novices who wouldn't choose a strong password on their own. They could be seen as a flexibility violation, but equity wins.

Question

What is the STRUCTURE PRINCIPLE?

Answer

From Constantine & Lockwood: organize the UI purposefully, grouping related things together, separating unrelated things, and making similar things look alike — based on models the user can recognize. Example: The Wall Street Journal's website mirrors structural principles from its print edition (bold headlines, section spacing) because the same organizational logic applies to both.

Question

What are the FIVE LEARNABILITY PRINCIPLES and what does each mean?

Answer

1. Predictability — can you predict what will happen before acting? (e.g., grayed-out button = nothing will happen) 2. Synthesizability — can you see what actions led to the current state? (e.g., undo menu log) 3. Familiarity — does the interface leverage real-world knowledge? (e.g., red=bad, green=good) 4. Generalizability — does knowledge of one interface transfer to others? (e.g., Ctrl+S always saves) 5. Consistency — do similar tasks within the interface behave the same way?

Question

What is a MENTAL MODEL and how do representations affect it?

Answer

A mental model is the internal understanding a user builds of how a system works. Representations shape mental models — a good representation helps users build an accurate mental model, making the interface learnable. A bad representation leads to confusion. Example: car climate control where pressing 'Auto' doesn't turn it off — the system violates the user's mental model of a toggle.

Question

What makes a representation 'good'? Use the hiker example.

Answer

The hiker problem: described verbally, it's hard to see that the hiker is ever at the same spot at the same time on both days going up and down. Change the representation to show BOTH days simultaneously on the same graph — it becomes obvious the two paths must cross. A good representation makes the solution self-evident.

Question

What are the FOUR COMPONENTS of a GOMS model?

Answer

Goals — what the user wants to accomplish Operators — atomic, observable actions (the smallest unit; can't be broken down) Methods — sequences of operators that accomplish a goal Selection Rules — criteria for choosing between methods Example: Goal = disable home alarm. Methods = keypad OR key fob. Selection rule = hands full → keypad; hands free → key fob.

Question

What are the THREE VARIANTS of GOMS beyond basic GOMS?

Answer

1. KLM-GOMS (Keystroke-Level Model) — simplest; assign time costs to operators and sum them to predict task time 2. CMN-GOMS — adds sub-methods and conditions in a strict goal hierarchy; very granular (goals go down to 'delete phrase') 3. NGOMSL (Natural GOMS Language) — natural language form; most human-readable

Question

What are the STRENGTHS and WEAKNESSES of the GOMS model?

Answer

Strengths: formalizes user interaction into measurable steps; allows predictions about efficiency; helps identify unnecessary operators Weaknesses: assumes expert users (no novice or error modeling); doesn't capture internal reasoning; misses complexity (e.g., deciding whether to drive vs. take transit)

Question

What is COGNITIVE TASK ANALYSIS and how does it differ from GOMS?

Answer

CTA focuses on the INTERNAL cognitive processes behind a task — memory, attention, reasoning — not just observable inputs/outputs. GOMS is more behaviorist (what goes in, what comes out). CTA is cognitivist (what is the user THINKING). Example: watching someone drive, you'd never see from the outside that they're monitoring fuel level or route progress — CTA captures these invisible cognitive tasks.

Question

What are the STEPS of a Cognitive Task Analysis?

Answer

1. Collect preliminary knowledge (observe people doing the task) 2. Identify knowledge representations (what KINDS of knowledge are needed — sequences? checklists? web of facts?) 3. Populate knowledge representations (what does the user actually know — specific steps, things to monitor) 4. Analyze and verify data (confirm interpretation with users) 5. Formalize into comparable structures 6. Format results for design application

Question

What is the PROCESSOR MODEL vs. PREDICTOR MODEL of the user?

Answer

Processor model: treats the user as an input-output machine. Focus on observable actions. GOMS is an example. Predictor model: tries to understand the user's internal cognition, reasoning, and predictions. Cognitive task analysis is an example. Behaviorism ↔ Processor model; Cognitivism ↔ Predictor model.

Question

What is a SLIP vs. a MISTAKE?

Answer

Slip: the user KNOWS what they want to do but does something different. (e.g., pressing left instead of right in Tetris) Mistake: the user KNOWS what they want to accomplish but chooses the wrong action because they don't know the right one. (e.g., pressing counter-clockwise rotate when you want clockwise because you don't know which is which) Slips → fix with constraints/mappings. Mistakes → fix with discoverability/representations.

Question

What are the THREE TYPES OF EVALUATION?

Answer

1. Qualitative evaluation — interpretive feedback from users (what they like/dislike, what's easy/hard) 2. Empirical evaluation — controlled experiments producing quantitative, statistically verifiable results 3. Predictive evaluation — evaluation WITHOUT real users, using expert judgment or models

Question

What is FORMATIVE vs. SUMMATIVE evaluation?

Answer

Formative: done DURING the design process to improve the interface. Most evaluations should be this. Summative: done AT THE END to conclusively prove what the final interface achieves (often quantitative). In practice, formative evaluation is preferred — the design lifecycle ideally never truly ends.

Question

What are the FOUR DATA QUALITY TERMS in evaluation?

Answer

Reliability — consistency over time (Amanda always says 2:30 = reliable) Validity — accuracy of measurement (Amanda always says 2:30 but it's 1:30 = reliable but NOT valid) Generalizability — can conclusions apply to people beyond the study participants? Precision — how specific is the measurement? (1:30 vs. 1:31:27)

Question

What are the FIVE EVALUATION METRICS for interfaces?

Answer

1. Efficiency — how long does it take to complete a task? 2. Accuracy — how many errors does the user commit? 3. Learnability — how long until a new user reaches a defined level of expertise? 4. Memorability — how well does the user remember the interface after time away? 5. Satisfaction — does the user enjoy using it? What's their cognitive load?

Question

What is the THINK-ALOUD PROTOCOL and what's its key weakness?

Answer

Users verbalize their thoughts while using the interface — what they see, what they expect, why they act. Weakness: thinking aloud CHANGES behavior. Users become more deliberative and less intuitive, meaning they often figure out confusing interfaces that real users (who don't think aloud) would fail at.

Question

What is a POST-EVENT PROTOCOL and what's its key weakness?

Answer

Users complete the session first, then give feedback at the end. More natural behavior during use. Weakness: users may forget difficulties that happened early in the session by the time they give feedback.

Question

What is the difference between BETWEEN-SUBJECTS and WITHIN-SUBJECTS design?

Answer

Between-subjects: each participant experiences only ONE condition. Groups are compared against each other. Simpler, but needs more participants. Within-subjects: each participant experiences ALL conditions. Doubles data per participant, but requires more of their time and you must randomize the ORDER of treatments to control for order effects.

Question

What is RANDOM ASSIGNMENT and why is it critical?

Answer

Randomly assigning participants to treatment groups controls for confounding variables. Without it, biases creep in — e.g., if all punctual participants ended up in one condition, or if the experimenter got better at running sessions partway through the study, results would be skewed and misleading.

Question

What is the NULL HYPOTHESIS vs. ALTERNATIVE HYPOTHESIS?

Answer

Null hypothesis: the default assumption that there is NO difference between conditions. We assume this is true until proven otherwise. Alternative hypothesis: what we're trying to prove — that there IS a real difference. We accept the alternative only if there's less than a 5% chance the observed difference arose by random chance.

Question

What is STATISTICAL SIGNIFICANCE and what threshold is used?

Answer

A result is statistically significant if there is less than a 5% chance the observed difference arose by random chance. At that threshold, we reject the null hypothesis and accept the alternative hypothesis. Just because two averages differ doesn't mean the difference is significant — it also depends on standard deviation.

Question

What is a TYPE I ERROR (false positive) and what causes it?

Answer

A Type I error is falsely rejecting the null hypothesis — concluding there's a real difference when there isn't. Repeated testing raises this risk. Analogy: if you have a 1-in-20 chance of winning and play 20 times, you'll likely win once — not because you got better, but because your overall exposure increased. Solution: use ANOVA instead of multiple t-tests.

Question

What are the FOUR DATA TYPES and which statistical tests correspond to each?

Answer

Nominal (unordered categories) → Chi-squared test Ordinal (ordered categories) → Kolmogorov-Smirnov or Chi-squared Interval/Ratio (numeric) → t-test (2 groups) or ANOVA (3+ groups) Binomial (success/failure) → Binomial test Key rule: the data type determines the test.

Question

When do you use a CHI-SQUARED TEST?

Answer

When comparing distributions of NOMINAL or ORDINAL data across two or more groups. Example: does the distribution of majors differ between online and in-person students? Works with 2 OR more treatment levels. Weakness: doesn't tell you WHERE the difference is — only that one exists.

Question

When do you use a T-TEST?

Answer

When comparing the AVERAGES of INTERVAL or RATIO data between exactly TWO groups. Example: average reaction time with orange vs. green alert colors. The difference between averages must be large enough relative to the standard deviation to be considered significant — not just any numerical difference.

Question

When do you use ANOVA and why is it preferred over multiple t-tests?

Answer

ANOVA (Analysis of Variance) is used for INTERVAL or RATIO data with THREE OR MORE groups. Running multiple t-tests instead raises the risk of a Type I error (false positive). ANOVA tests all groups simultaneously. Weakness: like Chi-squared, it tells you A difference exists but not WHERE — follow up with pairwise t-tests.

Question

When do you use a BINOMIAL TEST?

Answer

When individual observations are binary — success or failure. Example: which interface allows users to complete a task more often? The data looks like ratio data (94.9% vs. 92.1%) but a t-test won't work because there's no proper standard deviation on binary outcomes.

Question

What is HEURISTIC EVALUATION?

Answer

A predictive evaluation method where usability experts evaluate an interface independently against preset heuristics (e.g., Nielsen's Ten) and identify violations. Each evaluator works ALONE to avoid bias. It's faster and cheaper than user testing but less reliable — experts may not represent real users.

Question

What is a COGNITIVE WALKTHROUGH?

Answer

A predictive evaluation method where the designer steps through a task mentally, imagining what a NOVICE user would see, think, and do at each stage. At each step, ask: (1) Would the user know what to do? (2) Would they understand the feedback? Example: walking through a note-taking app and realizing there's no confirmation the note was saved.

Question

What is MODEL-BASED vs. SIMULATION-BASED EVALUATION?

Answer

Model-based: use a task model (like GOMS) built during need-finding and trace through it in the context of your new design to assess whether the interface improves or worsens the task. Simulation-based: take it further — build an AI agent that simulates a human user and can interact with the interface automatically (useful for high-stakes, large-scale projects like air traffic control).

Question

What are the THREE DATA CAPTURE METHODS in qualitative evaluation and what are their trade-offs?

Answer

Video capture: automated, comprehensive, passive — but intrusive, hard to analyze, doesn't easily capture on-screen actions. Note-taking: cheap, non-intrusive, easy to analyze — but slow, misses fast interactions and nuances like hesitation time. Software logging: automatic, passive, analyzable — but only captures on-screen actions, limited to working prototypes (useless on paper prototypes).

Question

What is PREDICTIVE EVALUATION and when should (and shouldn't) you use it?

Answer

Evaluation done without real users — using expert review or models to simulate user behavior. Use it: between user sessions as a rapid check to keep the user in mind. Don't use it: as a replacement for real user evaluation. It's better than nothing, but real user feedback is always preferred when possible.

Question

What is the HIKER ON THE MOUNTAIN problem and what principle does it illustrate?

Answer

A hiker goes up a mountain on Day 1 (7am–7pm) and back down on Day 2 (7am–7pm). Was the hiker ever at the same point at the same time on both days? Described verbally, it seems unlikely. Visualize BOTH days simultaneously on the same graph — it's immediately obvious the two paths must cross. Illustrates how a GOOD REPRESENTATION can make a solution self-evident.

Question

What is the WOLVES AND SHEEP (circles and squares) problem and what does it illustrate?

Answer

A puzzle where squares (wolves) can never outnumber circles (sheep) on either side. The key lesson: poor representations (verbal description) make the problem very hard; good ones (visual, wolves/sheep metaphor, showing all possible state transitions) make it tractable. Illustrates all four criteria of a good representation working together.

Question

What are the ADVANTAGES specific to each type of evaluation?

Answer

Qualitative: informs ongoing design, investigates user thought process, draws from real participants Empirical: identifies provable/generalizable advantages, draws from real participants Predictive: does NOT require real users (its biggest strength AND weakness); informs ongoing design; approximates thought process via expert simulation

Question

What is INDEPENDENT VARIABLE vs. DEPENDENT VARIABLE in empirical evaluation?

Answer

Independent variable: what you VARY or manipulate (e.g., orange vs. green alert color; online vs. in-person class) Dependent variable: what you MEASURE in response (e.g., reaction time; distribution of grades) The data TYPE of each variable determines which statistical test to use.

Question

What are the SEVEN STEPS of designing an evaluation?

Answer

1. Define the task 2. Define performance measures 3. Develop the experiment 4. Recruit participants 5. Do the experiment 6. Analyze the data 7. Summarize findings to inform the next design iteration

Question

What is the difference between QUALITATIVE and QUANTITATIVE data in the context of the design lifecycle?

Answer

Qualitative data is most useful EARLY — it's interpretive, helps understand the user, and informs redesign. Quantitative data is most useful LATE — it requires rigorous evaluation setups and is used to prove or demonstrate improvement. In reality, qualitative is useful at any stage; quantitative only becomes feasible once prototypes are high fidelity.

Question

What is a TREATMENT in empirical evaluation and why must you control for lurking variables?

Answer

A treatment is a condition being compared (e.g., orange vs. green color, interface A vs. interface B). Lurking variables are uncontrolled differences that could explain your results instead of the treatment. Example: if one logo was orange AND circular while the other was teal AND triangular, you couldn't conclude anything about color alone — shape is a lurking variable.