AI & ML – Page 5 – Clear Thinking in Data, Cloud, and AI

Building the Bridge Between AI and the Real World

AI is like working with an inept wizard. (Yes, I have a lot of metaphors for this.) When you ask the wizard a question, he responds with the intellect and rapidity of someone who has access to the knowledge of the cosmos. He’s read everything, but he’s a bit dotty. He’s lived his entire life in his lair, consuming his tomes. Despite his vast knowledge, he has no idea what happened in the world yesterday. He doesn’t know what’s in your inbox. Moreover, he knows nothing about your contact list, your company’s proprietary data, or the fact that your cousin’s birthday party got bumped to next Friday. The wizard is a genius. He’s also an idiot savant.

Therein lies the paradox. We have designed amazing tools, but they require a lot of handholding. Context has to be spoon-fed. You can paste an entire mountain of reference documents and a virtual novel of a prompt. That amount of work can often eliminate any benefit you get from using an LLM at all. When it does work, it’s a victory but it feels like you’ve wrestled the LLM into submission instead of working with it.

Users have been cobbling together ad hoc solutions for this problem. Plug-ins. Vector databases. Retrieval systems. These Band-Aids are clever, but fragile. They don’t cooperate with each other. They break when you switch providers. It’s less “responsible plumbing” and more “duct tape and prayer.”

This is where Model Context Protocol (MCP) comes in. It establishes a foundational infrastructure rather than creating one more marketplace for custom connectors. MCP sets up standardized rails for integrating context. This shared framework enables models to request context, retrieve it from authorized sources, and securely use it. It replaces the current kluge of vendor-specific solutions with a unified protocol designed to connect AI to real world systems and data.

As AI transitions from an experimental novelty to practical infrastructure, this utility becomes crucial. For the wizard to be effective, he needs to be able to do more than solve one-off code hiccups or create content for your blog. For true usefulness at scale in a professional environment you need a standardized way to integrate context. That context has to respect permissions, meet security standards, and be up to date.

The Problem of Context in AI

Models tend to make things up and they do it with confidence. Sometimes they cite fictional academic papers. Sometimes they invent dates, statistics, or even people. These hallucinations are a huge problem, of course, but they’re a symptom of a much larger issue: a lack of context.

The Context Window Problem

Developers have been trying to develop workarounds by providing relevant data as needed. Pasting in documents, providing chunks of a database, and formulating absurdly robust prompts. These fixes are great, but every LLM has what we call a context window. The window determines how many tokens a model can remember at any given time. Some of the bigger LLMs have windows that can accommodate hundreds of thousands of tokens, but users still quickly find ways to hit that wall.

Bigger context windows should be the answer, right? But there’s our Catch 22: The more data you provide within that window, the more fragile the entire set up becomes. If there’s not enough context, the model may very well just make stuff up. If you provide too much, the model bogs down or becomes too pricey to run.

The Patchwork Fixes

The AI community wasn’t content to wait for one of the big players to provide a solution. Everyone rushed to be first-to-market with an assortment of potential fixes.

Custom plug-ins let the models access external tools and databases, extending their abilities beyond the frozen training data. You can see the issue here. Plug-ins designed for one platform won’t work with another. Your workspace becomes siloed and fragmented, forcing you to rework your integrations if you try to switch AI providers.

Retrieval Augmented Generation (RAG) converts documents to embed them into a vector database so that you can pull only the most relevant chunks during a query. This method is pretty effective but requires significant technical skills and ongoing fine-tuning based on your organization’s specific requirements.

… this article is continued online. Click here to continue

ZoomIt – Screen zoom and annotation

ZoomIt is a screen zoom, annotation, and recording tool for technical presentations and demos. You can also use ZoomIt to snip screenshots to the clipboard or to a file. ZoomIt runs unobtrusively in the tray and activates with customizable hotkeys to zoom in on an area of the screen, move around while zoomed, and draw on the zoomed image. I wrote ZoomIt to fit my specific needs and use it in all my presentations.

Using ZoomIt

The first time you run ZoomIt it presents a configuration dialog that describes ZoomIt’s behavior, let’s you specify alternate hotkeys for zooming and for entering drawing mode without zooming, and customize the drawing pen color and size. I use the draw-without-zoom option to annotate the screen at its native resolution, for example. ZoomIt also includes a break timer feature that remains active even when you tab away from the timer window and allows you to return to the timer window by clicking on the ZoomIt tray icon.

Shortcuts

ZoomIt offers a number of shortcuts which can extend its usage greatly.

Function	Shortcut
Zoom Mode	Ctrl + 1
Zoom In	Mouse Scroll Up or Up Arrow
Zoom Out	Mouse Scroll Down or Down Arrow
Start Drawing (While In Zoom Mode)	Left-Click
Stop Drawing (While In Zoom Mode)	Right-Click
Start Drawing (While Not In Zoom Mode)	Ctrl + 2
Increase/Decrease Line And Cursor Size (Drawing Mode)	Ctrl + Mouse Scroll Up/Down or Arrow Keys
Center The Cursor (Drawing Mode)	Space Bar
Whiteboard (Drawing Mode)	W
Blackboard (Drawing Mode)	K
Type in Text (Left Aligned)	T
Type in Text (Right Aligned)	Shift + T
Increase/Decrease Font Size (Typing Mode)	Ctrl + Mouse Scroll Up/Down or Arrow Keys
Red Pen	R
Red Highlight Pen	Shift + R
Green Pen	G
Green Highlight Pen	Shift + G
Blue Pen	B
Blue Highlight Pen	Shift + B
Yellow Pen	Y
Yellow Highlight Pen	Shift + Y
Orange Pen	O
Orange Highlight Pen	Shift + O
Pink Pen	P
Pink Highlight Pen	Shift + P
Blur Pen	X
Draw a Straight Line	Hold Shift
Draw a Rectangle	Hold Ctrl
Draw an Ellipse	Hold Tab
Draw an Arrow	Hold Ctrl + Shift
Erase Last Drawing	Ctrl + Z
Erase All Drawings	E
Copy Screenshot to Clipboard	Ctrl + C
Crop Screenshot to Clipboard	Ctrl + Shift + C
Save Screenshot as PNG	Ctrl + S
Save Cropped Screenshot to a File	Ctrl + Shift + S
Copy a Region of The Screen To Clipboard	Ctrl + 6
Save a Region of The Screen To a File	Ctrl + Shift + 6
Start/Stop Full Screen Recording Saved as MP4 (Windows 10 May 2019 Update And Higher)	Ctrl + 5
Crop Screen Recording Saved as MP4 (Windows 10 May 2019 Update And Higher)	Ctrl + Shift + 5
Screen Record Only The Window That The Mouse Cursor is Positioned Over Saved as MP4 (Windows 10 May 2019 Update And Higher)	Ctrl + Alt + 5
Show Countdown Timer	Ctrl + 3
Increase/Decrease Time	Ctrl + Mouse Scroll Up/Down or Arrow Keys
Minimize Timer (Without Pausing It)	Alt + Tab
Show Timer When Minimized	Left-Click On The ZoomIt Icon
LiveZoom Mode	Ctrl + 4
LiveDraw Mode	Ctrl + Shift + 4
Start DemoType	Ctrl + 7
Move back to the previous snippet (DemoType)	Ctrl + Shift + 7
Advance to the next snippet (DemoType User-driven Mode)	Space Bar
Exit	Esc or Right-Click

Download from here;

Natural Language AI-Powered Smart UI

Looking for real-world AI examples is a challenge and part of this challenge comes from Generative AI (GenAI) news dominating the media. It feels like every AI demo involves chatting with GenAI to produce content. The obligatory chat completion demo has started to become the to-do list of AI demo apps, and, to make matters worse, it’s selling AI short. GenAIs rely on large language models (LLMs), which are the brain behind natural language processing tasks. In this article, I’ll explore the opportunities presented by LLMs using a real-world research-and-development experiment. This experiment is part of on-going research into AI-enabled user interface components (aka .NET Smart Components) by Progress Software and Microsoft.

Supervised and Unsupervised Learning

Supervised Learning

Definition:
The model learns from labeled data — meaning each input has a corresponding correct output.

Goal:
Predict an output (label) from input data.

Examples:

Email spam detection (Spam / Not Spam)
Predicting house prices (Price in $)
Handwriting recognition (0–9 digits)

Types:

Classification (output is a category): e.g., cat vs dog
Regression (output is a number): e.g., predicting temperature

Requires Labels? ✅ Yes

Example Dataset:

Input Features	Label
“Free offer now” (email text)	Spam
3 bedrooms, 2 baths, 1500 sq ft	$350,000

🔍 Unsupervised Learning

Definition:
The model learns patterns from unlabeled data — it finds structure or groupings on its own.

Goal:
Explore data and find hidden patterns or groupings.

Examples:

Customer segmentation (group customers by behavior)
Anomaly detection (detect fraud)
Topic modeling (find topics in articles)

Types:

Clustering: Group similar data points (e.g., K-Means)
Dimensionality Reduction: Simplify data (e.g., PCA)

Requires Labels? ❌ No

Example Dataset:

Input Features
Age: 25, Spent: $200
Age: 40, Spent: $800

(The model might discover two customer groups: low-spenders vs high-spenders)

✅ Quick Comparison

Feature	Supervised Learning	Unsupervised Learning
Labels	Required	Not required
Goal	Predict outputs	Discover patterns
Output	Known	Unknown
Examples	Classification, Regression	Clustering, Dimensionality Reduction
Algorithms	Linear Regression, SVM, Random Forest	K-Means, PCA, DBSCAN

Supervised Learning Use Cases

1. Email Spam Detection

✅ Label: Spam or Not Spam
📍 Tech companies like Google use supervised models to filter email inboxes.

2. Fraud Detection in Banking

✅ Label: Fraudulent or Legitimate transaction
🏦 Banks use models trained on historical transactions to flag fraud in real-time.

3. Loan Approval Prediction

✅ Label: Approved / Rejected
📊 Based on income, credit history, and employment data, banks decide whether to approve loans.

4. Disease Diagnosis

✅ Label: Disease present / not present
🏥 Healthcare systems train models to detect diseases like cancer using medical images or lab reports.

5. Customer Churn Prediction

✅ Label: Will churn / Won’t churn
📞 Telecom companies predict if a customer is likely to cancel a subscription based on usage data.

🔍 Unsupervised Learning Use Cases

1. Customer Segmentation

❌ No labels — model groups customers by behavior or demographics.
🛒 E-commerce platforms use this for targeted marketing (e.g., Amazon, Shopify).

2. Anomaly Detection

❌ No labeled “anomalies” — model detects outliers.
🛡️ Used in cybersecurity to detect network intrusions or malware.

3. Market Basket Analysis

❌ No prior labels — finds item combinations frequently bought together.
🛍️ Supermarkets like Walmart use this to optimize product placement.

4. Topic Modeling in Text Data

❌ No labels — model finds topics in documents or articles.
📚 News agencies use it to auto-categorize stories or summarize themes.

5. Image Compression (PCA)

❌ No labels — model reduces dimensionality.
📷 Used in storing or transmitting large image datasets efficiently.

🚀 In Summary:

Industry	Supervised Example	Unsupervised Example
Finance	Loan approval	Fraud pattern detection
Healthcare	Diagnosing diseases from scans	Grouping patient records
E-commerce	Predicting purchase behavior	Customer segmentation
Cybersecurity	Predicting malicious URLs	Anomaly detection in traffic logs
Retail	Forecasting sales	Market basket analysis

Training, Validation and Test Data in Machine Learning

Training Data

Purpose: Used to teach (train) the model.
Contents: Contains both input features and corresponding output labels (in supervised learning).
Usage: The model learns patterns, relationships, and parameters from this data.
Size: Typically the largest portion of the dataset (e.g., 70–80%).

Example:
If you’re training a model to recognize handwritten digits:

Input: Images of digits
Label: The digit (0–9)

Test Data

Purpose: Used to evaluate how well the model performs on unseen data.
Contents: Same format as training data (features + labels), but not used during training.
Usage: Helps assess model accuracy, generalization, and potential overfitting.
Size: Smaller portion of the dataset (e.g., 20–30%).

Key Point: It simulates real-world data the model will encounter in production.

Validation Data

Purpose: Used to tune the model’s hyperparameters and monitor performance during training.
Contents: Same format as training/test data — includes input features and labels.
Usage:
- Helps choose the best version of the model (e.g., best number of layers, learning rate).
- Detects overfitting early by evaluating on data not seen during weight updates.
Not used to directly train the model (no weight updates from validation data).

Summary Table

Aspect	Training Data	Validation Data	Test Data
Used for	Training model	Tuning model	Final evaluation
Used during	Model training	Model training	After model training
Updates model?	Yes	No	No
Known to model	Yes	Seen during training	Never seen before

Tip:

In practice, for small datasets, we often use cross-validation, where the validation set rotates among the data to make the most of limited samples.

Typical Size Ranges for Small Datasets

Dataset Type	Number of Samples (Roughly)
Very Small	< 500 samples
Small	500 – 10,000 samples
Medium	10,000 – 100,000 samples
Large	100,000+ samples

Why Size Matters

Small datasets are more prone to:
- Overfitting – model memorizes data instead of learning general patterns.
- High variance in performance depending on the data split.
Big models (e.g., deep neural networks) usually need large datasets to perform well.

💡 Common Examples

Medical diagnosis: Often < 5,000 patient records → small dataset.
NLP for niche domains: < 10,000 labeled texts → small.
Handwritten digit dataset (MNIST): 60,000 training images → medium-sized.

🔁 Tip for Small Datasets

If your dataset is small:

Use cross-validation (like 5-fold or 10-fold).
Consider simpler models (e.g., logistic regression, decision trees).
Use data augmentation (e.g., rotate/scale images, reword texts).
Apply transfer learning if using deep learning (e.g., pre-trained models like BERT, ResNet).