qsv is a command line program for indexing, slicing, analyzing, splitting, enriching,
validating & joining CSV files. Commands are simple, fast and composable:
- Simple tasks are easy.
- Performance trade offs are exposed in the CLI interface.
- Composition does not come at the expense of performance.
NOTE: qsv is a fork of the popular xsv utility, merging several pending PRs since xsv 0.13.0's release, along with additional features & commands for data-wrangling. See FAQ for more details. (NEW and EXTENDED commands are marked accordingly).
Command | Description |
---|---|
apply | Apply series of string, date, currency & geocoding transformations to a CSV column. (NEW) |
behead | Drop headers from a CSV. (NEW) |
cat | Concatenate CSV files by row or by column. |
count1 | Count the rows in a CSV file. (Instantaneous with an index.) |
dedup2 | Remove redundant rows. (NEW) |
enum | Add a new column enumerating rows by adding a column of incremental or uuid identifiers. Can also be used to copy a column or fill a new column with a constant value. (NEW) |
exclude1 | Removes a set of CSV data from another set based on the specified columns. (NEW) |
explode | Explode rows into multiple ones by splitting a column value based on the given separator. (NEW) |
fill | Fill empty values. (NEW) |
fixlengths | Force a CSV to have same-length records by either padding or truncating them. |
flatten | A flattened view of CSV records. Useful for viewing one record at a time. e.g. qsv slice -i 5 data.csv | qsv flatten . |
fmt | Reformat a CSV with different delimiters, record terminators or quoting rules. (Supports ASCII delimited data.) (EXTENDED) |
foreach | Loop over a CSV to execute bash commands. (*nix only) (NEW) |
frequency13 | Build frequency tables of each column. (Uses parallelism to go faster if an index is present.) |
headers | Show the headers of a CSV. Or show the intersection of all headers between many CSV files. |
index | Create an index for a CSV. This is very quick & provides constant time indexing into the CSV file. |
input | Read a CSV with exotic quoting/escaping rules. |
join1 | Inner, outer, cross, anti & semi joins. Uses a simple hash index to make it fast. (EXTENDED) |
jsonl | Convert newline-delimited JSON to CSV. (NEW) |
lua | Execute a Lua script over CSV lines to transform, aggregate or filter them. (NEW) |
partition | Partition a CSV based on a column value. |
pseudo | Pseudonymise the value of the given column by replacing them with an incremental identifier. (NEW) |
rename | Rename the columns of a CSV efficiently. (NEW) |
replace | Replace CSV data using a regex. (NEW) |
reverse2 | Reverse order of rows in a CSV. (NEW) |
sample1 | Randomly draw rows from a CSV using reservoir sampling (i.e., use memory proportional to the size of the sample). (EXTENDED) |
search | Run a regex over a CSV. Applies the regex to each field individually & shows only matching rows. (EXTENDED) |
searchset | Run multiple regexes over a CSV in a single pass. Applies the regexes to each field individually & shows only matching rows. (NEW) |
select | Select or re-order columns. (EXTENDED) |
slice12 | Slice rows from any part of a CSV. When an index is present, this only has to parse the rows in the slice (instead of all rows leading up to the start of the slice). |
sort | Sort CSV data. (EXTENDED) |
split13 | Split one CSV file into many CSV files of N chunks. |
stats123 | Show basic types & statistics of each column in a CSV. (i.e., sum, min/max, min/max length, mean, stddev, variance, quartiles, IQR, lower/upper fences, skew, median, mode, cardinality & nullcount) (EXTENDED) |
table2 | Show aligned output of a CSV using elastic tabstops. (EXTENDED) |
transpose2 | Transpose rows/columns of a CSV. (NEW) |
Binaries for Windows, Linux and macOS are available from Github.
Alternatively, you can compile from source by
installing Cargo
(Rust's package manager)
and installing qsv
using Cargo:
cargo install qsv
Compiling from this repository also works similarly:
git clone git://github.com/jqnatividad/qsv
cd qsv
cargo build --release
The compiled binary will end up in ./target/release/qsv
.
If you want more performance, set this environment variable BEFORE installing/compiling:
On Linux and macOS:
export CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'
On Windows Powershell:
$env:CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'
Do note though that the resulting binary will only run on machines with the
same architecture as the machine you installed/compiled from.
To find out your CPU architecture and other valid values for target-cpu
:
rustc --print target-cpus
Some very rough benchmarks of
various qsv
commands.
Dual-licensed under MIT or the UNLICENSE.
qsv was made possible by datHere - Data Infrastructure Engineering.
Standards-based, best-of-breed, open source solutions to make your Data Useful, Usable & Used.
This project is unrelated to Intel's Quick Sync Video.