Charsep
Download
Releases
Help
Disclaimer
Background info "Character-Separated-Values" files are broadly used on
computers. It's an easy way to provide 2-dimensions structures of data
- and 2-dimensions structures of data are so easily mentally or
physically represented that they are the first solution we think
of, to capture structured information beyond "lists". Usually, simple Text editors are used to
quickly edit Csv files, recreating the "columns" of information
mentally. If the process becomes heavier, users then turn to
spreadsheet software - Excel or OpenOffice "Calc", that are excellent
to process such grid structures. However using these spreadsheet tools has
some drawbacks :
- They "translate" Csv files, interpreting values in
datatypes (e.g. dates, numerics...).
- They may have limitations - 255 columns, 65536
rows for excel versions prior to excel 2007. Even with current
releases, due to their powerful functionalities they are not dealing
perfectly with huge files.
- Designed for calculation, they have limited Csv
specific processing functionalities. They don't deal that well with
Data profiling, Structure management (Columns-based processing),
search/replace features.
- Their "native" format is not csv
- you will need import/export activities each time you process such a
file.
This simple tool is implemented to bridge the gap
between the two types of software (Text editors and Spreadsheet
processors) for processing Csv files. It is not replacing neither of
them but you may find it useful for many tasks. In addition, a
command-line processor, very simple to use, allows to automate some
tasks you may require on such files. You
can find on my YouTube channel, some videos that provide an overview of
features of Charsep. The two first ones show fundamental concepts, the
other ones go deeper into specific topics. I will include additional
tutorials over time.
Some
more information on the "CSV" format... : CSV is in fact commonly
defined as an acronym for 'Comma-Separated-Values' however the
separator is not always a Comma - since it's not the most convenient
separator and is quite often a source of issues when used as part of values. Hence a better definition is
'Character-Separated-Values'. This format is not a very well-defined
standard (at least many people "think" they are building csv files
although not following a real standard), however you can look at RFC 4180
to get more info. Or of course your preferred search engine
or wikipedia
can help you to get tons of documents on the topic...
Some features of Charsep
Charsep is a java-developped program, using a
"Swing" user-interface - and therefore can run on MacOS, Linux, Windows... Below is a non-exhaustive list of features provided by Charsep
- Direct in-grid edition of data
- Structure
management (Change of columns order, removal/addition of columns, that
can be automated - based on 'template structures')
- Support of huge files (in number of columns or number of rows)
- Simple
change of column separator (not a simple char-replacement, if separator
is used in quoted strings) and Unix/Ms-Dos row separators (CrLf or Lf)
- Support of different charsets including multibyte / unicode and quick conversion of charsets
- Profiling
: Complete statistics on structure (Distinct values in cols, Patterns, Min
Max Avg size of values per col), extreme values... Control of files alignment with a defined profile
- Merge
of files, Comparison ("Diff") of files with a variety of options
(content based or position based, case-sensitive checks or not, similarity of values, etc..)
- Quick append of rows or columns / support of clipboard as a source or target
- Support of headers / headerless files
- Partial load of files (First n rows, Last n rows, from row n to m, from col n
to m, only rows where a column contains/does not contain a specific
value...)
- Corrupted files detection / correction of files (Missing columns)
- 1-click removal of all empty columns
- Easy
and Rich search functionalities : RegEx based or 'Starts with,
contains, does not contain, is unique/has duplicates, matches another
col values, matches a set of values...
- Searches composition (And / Or) + Append to current set of search (Boolean operations for selection of search results)
- Search on 1 column or all columns
- Search by similarity - fuzzy-logic algorithms : Jaro-winkler, Levenshtein edit distance, Soundex
- Search/Replace and transform - to uppercase, lowercase, filter alpha/numeric chars..., or RegEx based
- Quick sort on any column - ascending / descending
- Search in Header labels
- Transposition of files (Switch columns to rows) taking into account headers/first column or not
- Switch columns labels to columns nums or alphabetical columns ('Excel-format')
- Command-line processing of files transformation
- Generation of random files with many features for columns definition
Please
click here to get Help
on Charsep
|