Skip to content
Data Engineering ยท Tool

DataForge โ€” synthetic dataset generator for Power BI & Analytics

Generate realistic synthetic datasets with smart schemas, advanced data quality controls, and Power BI-ready exports โ€” no real data needed. Pick an industry theme, configure your tables, inject controlled quality issues at global or column level, and review an inline data quality report before exporting everything in seconds.

Try it live

DataForge is fully interactive below โ€” pick a theme, configure your tables, and generate a dataset right here.

If the embed doesn't load, open DataForge in a new tab.

The problem

You're building a Power BI dashboard or testing an analytics pipeline, but you don't have access to production data. Maybe it's locked behind compliance, maybe it doesn't exist yet, or maybe you just need a safe sandbox to experiment. So you spend hours manually crafting CSV files, inventing fake names and dates, and trying to make the relationships between tables actually make sense. The result? A fragile, unrealistic dataset that barely tests anything.

The solution

DataForge generates production-quality fictitious datasets in seconds. Pick an industry theme โ€” HR, Finance, Healthcare, Retail, and 20+ more โ€” and DataForge auto-generates a complete schema with table names, column types, primary/foreign key relationships, and realistic data powered by Faker. Need to test how your pipeline handles messy data? Configure data quality rules at global or column level across six dimensions โ€” nulls, duplicates, inconsistencies, and more โ€” then review an inline quality report that tells you exactly what was affected. Export as CSV, JSON, or Parquet, and import straight into Power BI with a ready-made schema file.

🎯
Who is this for?

Built for anyone who needs realistic test data for Power BI, Excel, or analytics projects โ€” whether you're a data analyst, a student learning dashboards, or a team lead prototyping a report.

🛡️

No real data required

Skip the compliance hurdles and data access requests. Generate realistic test datasets that behave like production data without any privacy concerns.

🧪

Built-in data quality rules

Define quality rules at global level or per column โ€” inject nulls, duplicates, stale timestamps, and invalid entries across six dimensions to stress-test your dashboards with precision.

📊

Inline data quality report

After generation, review a detailed quality report that shows exactly which issues were injected, which columns were affected, and the overall quality score โ€” so you know what to expect before importing.

From zero to dashboard in minutes

Pick a theme, generate, export, import into Power BI. What used to take hours of manual data prep now takes under a minute.


How it works

1

Pick a theme

Choose from 20+ industry themes using the searchable dropdown. Each theme comes with pre-built table schemas and realistic column definitions.

2

Configure your tables

Select how many tables to generate and set the number of rows per table. DataForge handles all primary key and foreign key relationships automatically.

3

Inject data quality issues

Configure quality rules at global level (applied across all columns) or column level (targeting specific columns with specific rules). Choose from six dimensions โ€” accuracy, completeness, consistency, timeliness, validity, and uniqueness.

4

Generate & export

Hit generate, preview the data, and export as CSV, JSON, or Parquet โ€” single tables or a full ZIP archive ready for Power BI.

5

Review the data quality report

After generation, an inline report breaks down every injected issue โ€” which quality dimensions were affected, which columns were targeted, and the overall data quality score. Use it to validate your test scenarios before importing into Power BI.


Key features

20+ industry themes

HR, Finance, Retail, Healthcare, Education, Manufacturing, Logistics, Cybersecurity, and more โ€” each with tailored schemas and realistic data patterns.

Smart schema generation

Auto-generated table names, column types, and PK/FK relationships that mirror real-world data models. No manual wiring needed.

Advanced data quality controls

Define rules at global or column level across six quality dimensions. After generation, review an inline data quality report that details every injected issue and its impact on your dataset.

Power BI ready

Export includes a schema JSON with relationship definitions, so you can import directly into Power BI's semantic model without manual setup.


Tech Stack

  • Frontend: Next.js with a dark-themed, responsive UI.
  • Backend: FastAPI (Python) powering the data generation engine.
  • Data generation: Faker + NumPy for realistic, randomized data with proper relationships.
  • Export: CSV, JSON, and Parquet formats with ZIP archive support.
  • Deployment: Hosted on Vercel (frontend) with a separate backend service.