Dedupe-csv | Remove Duplicates From CSV With Command-Line | Builtvisible

Vernon July 31, 2023

0 2 minutes read

Dedupe-csv | Remove Duplicates From CSV With Command-Line | Builtvisible

What is the dedupe-csv tool?

Available on NPM, dedupe-csv is a command line tool that reads a CSV file, scans for duplicates and exports the unique entries to a new CSV.

Based on Pandas’ drop_duplicates function, it provides a rich set of features with intuitive syntax and the convenience of a command line utility.

How to use dedupe-csv

As with our last NPM module Pivot-Table-JS, make sure you’ve installed Node.js and then add the dedupe-csv package:

npm i -g dedupe-csv

Once installed, the command line tool can be used from the terminal of your choosing, such as command prompt, PowerShell or Node.js command prompt on Windows; and iTerm2 or Terminal on Mac/Linux.

Just refer to the package name, followed by CSV file you wish to deduplicate:

dedupe-csv file="data.csv"

It’s important to note that the file parameter must point to the file relative to where you are currently located within the terminal. For example, if your terminal’s root directory is Documents and your CSV file is in a subdirectory called project, you’ll need to specify the full path:

dedupe-csv file="project/data.csv"

The command will start reading the file, look for identical rows/lines and remove duplicates, leaving you the first duplicate found.

When finished, it will output your results to a new file, keeping the original file unedited.

Other options

Remove duplicates by column

If you want to fine-tune the way the tool finds duplicates, you can use the column option to select a specific header (or headers) for the evaluation.

Here, we have a CSV called data.csv, where the brand header has two duplicates, the style header has duplicates on all entries, and rating has no duplicates.

brand,style,rating
Yum,cup,1
Yum,cup,2
Foo,cup,6
Foo,cup,8

If we want to check for duplicates within the brand column only, we can run this command:

dedupe-csv file="data.csv" column='brand'

Which will produce a file called data_deduped.csv containing:

brand,style,rating
Yum,cup,1
Foo,cup,6

To evaluate based on a sequence of columns, add as many headers as you want, separated by commas.

So given this file data.csv:

brand,style,rating
Yum,cup,1
Yum,cup,2
Foo,cup,3
Foo,pack,1
Foo,pack,5

You can enter:

dedupe-csv file="data.csv" column='brand, style'

Which will return:

brand,style,rating
Yum,cup,1
Foo,cup,3
Foo,pack,1

Remove duplicates but keep last value

In all these examples, we’ve been keeping the first entry found out of the duplicates. Instead, if we want to keep the last entry, we can specify the keep option.

For example, using this data.csv and command:

brand,style,rating
Yum,cup,1
Yum,cup,2
Foo,pack,1
Foo,pack,5

dedupe-csv file="data.csv" column='brand' keep='last'

Will return a data_deduped.csv containing:

brand,style,rating
Yum,cup,2
Foo,pack,5

It’s my hope that the SEO and wider digital marketing community find this module a useful addition to their toolkit.

If you have any questions about using the module or want to learn more about how Builtvisible build custom solutions to help businesses maximise their marketing spend, please get in touch.

Source link

Share on Facebook

Dedupe-csv | Remove Duplicates From CSV With Command-Line | Builtvisible

What is the dedupe-csv tool?

How to use dedupe-csv

Other options

Remove duplicates by column

Remove duplicates but keep last value

Vernon

How to Create SEO SOPs (That Work)

WordPress on Your Desktop: Studio By WordPress & Other Free Tools

The Forgotten Power of Expense Management / Blogs / Perficient

SEO Automation: 9 Tools That Boost Efficiency » Rank Math

Big Update To Google’s Ranking Drop Documentation

Medical health cover

Medical card malaysia

Home Healthcare Agency Miami | Home Care Assistance – 24/7 Nursing Care

VONTAR G10 Voice Remote Control

How to Create SEO SOPs (That Work)

Audio Visual Rentals in Los Angeles – GeoEvent

What is the dedupe-csv tool?

How to use dedupe-csv

Other options

Remove duplicates by column

Remove duplicates but keep last value

Share this:

Subscribe to our mailing list to get the new updates!

Managing Successful SEO Migrations

What a 500 Internal Server Error is, and How to Fix it

Related Articles

Medical health cover

Medical card malaysia

Home Healthcare Agency Miami | Home Care Assistance – 24/7 Nursing Care

VONTAR G10 Voice Remote Control

How to Create SEO SOPs (That Work)

Audio Visual Rentals in Los Angeles – GeoEvent

Enjoy Our Website? Please share :) Thank you!

New Social Bookmarking Site Lists 2023

Announcement