AI Data Laundromat Categorization from Lin Hsin Hsin Artificial Intelligence Center --the founder of LIN HSIN HSIN ART MUSEUM--Digital Art Museum, First Virtual Museum in the World, 1994.Wikipedia, Digital Media Center: Technology, Digital Art, Digital Paintings, Digital Sculptures, Digital Music, Digital Musical Instruments, Sound, Poetry, Animated Music, Web-enabled, Interactive, Digital Media Poineer

Lin Hsin Hsin Artificial intelligence Center

AI Data Laundromat Categorization
by Lin Hsin Hsin

🧼🛁🚿🧹 🧽🧻🧺

Missing Data

Identifying & addressing missing values

📍 Imputation -- eg mean, median, or regression-based
📍 Deletion
📍 Flagging

Duplicates

Detecting & eliminating duplicate records or entries to avoid redundancy & biasness

Outlier Detection

Identifying & handling data points that deviate significantly from the rest of the data. to prevent skewing during an analysis Note that these data points may have been resulted from errors or represent rare but valid observations. This is perform through statistical methods such as interquartile range or adjusted boxplots

Inconsistencies

Correcting inconsistencies in data

📍 Mismatched categories
📍 Typos
📍 Conflicting entries

Stardization

Converting data to a common format

📍 dates -- eg 01/01/2000 vs 01.01.2000
📍 units
📍 categories -- eg Yes, yes, or Y
📍 text case

Normalization

Scaling numerical data to a standard range (eg 0 -- 1) for fair comparison.

Data Type Conversion Ensuring data is in the correct format (eg string to numeric, categorical to numerical)

Text Cleaning

Processing text data, eg

📍 Removing stopwords
📍 Stemming
📍 Correcting spelling

Validation

Checking data against rules or constraints to ensure

Accuracy

This evaluates whether the data correctly represents the real-world entities or events it is meant to describe. Accuracy is often verified by cross-checking with external sources or domain knowledge

Completeness

This assesses whether all required data fields are filled and whether the dataset contains sufficient information for analysis

Conformations

Conform to predefined rules or constraints, eg

📍 A phone number must follow a specific format
📍 A date must be within a valid range

Feature Engineering

Creating new features or modifying existing ones to improve data utility for analysis