<< back to Charsep main page

Charsep     Download     Releases     Help     Disclaimer

You will easily find by yourself how the tool work. But here are some basic hints.

The tool allows to load a file, display/edit it in a matrix, and save it. Each time you open a file you provide parameters, like the file path, separator character, the charset, etc... and each time you save it you provide similar parameters (like in a "Save as..." dialog in common applications). The application always starts by opening an "open file" dialog, then displays the main screen, showing a menu and a toolbar on top, the matrix, and an event log at the bottom.

You can find on my YouTube channel, some videos that provide an overview of features of Charsep. The two first ones show fundamental concepts, the other ones go deeper into specific topics.


Help on specific topics
  1. Options from the "File" menu
    1. Open a file
    2. Open selected rows
    3. Save a file
    4. Merge with file
    5. Generate a random grid
  2. Edit menu commands
    1. Find
    2. Search and replace
    3. File Compare
    4. Grid profile validation
  3. Structure menu commands
    1. Edit structure
    2. Transpose grid
    3. Insert Columns from clipboard
    4. Empty columns deletion
    5. Last selected column deletion
  4. The toolbar
  5. Contextual menu commands
  6. Command-line processing

Options from the “File” menu 

Renumber rows in grid

First column of matrix is not coming from the source file but displays the row position. It stays stable when a sorting is applied so you can revert back to file-order of rows. This “renumber” option allows you to "renumber" the rows of the grid, after a processing occurred - insertion and deletion or reordering of rows.

Grid Information

The "Grid information" command provides information on the content of the current matrix : number of rows, number of columns, number of selected rows, number of characters...  For each column it also provides the minimum, maximum and average number of chars for not null values - and the number of values for this column. The last column displays the row position (Or number of rows if more than one) on which this column has a max or min size. Since you can sort this information or search (Contextual search with right-click) you can discover quite a lot of information on your file...  Double-clicking information about a specific column provides the “profile” of this column – distribution factor and list of distinct values, and for numeric columns sum, standard deviation, mean and median value.. Similar information can be obtained through the “Get selected cols profile” button to get information across more than one column.

Open file...

You can either load from a file (enter path and file name or click on '...' to browse your volumes) or user your clipboard content to create a new file. Selecting the "encoding" allows to load files stored using different charsets (If you have no idea - most files are now using UTF8). The "separator" allows you to define which character in the file is separating the different columns. By ticking '1st row is Headers', you indicate to Charsep that the headers of the matrix will be based on the first row of the file - if unticked, the headers will just indicate a column sequence number. By ticking 'Double-quotes can encapsulate separator', the file will load as a 'csv' file should do, that is, separators wrapped in double-quotes will not be counted and will be displayed in cell content. When processing large files, it may be interesting to filter out some information and load only partially the file.
For this you have different filtering options :

  • Loading only rows for which a column contains a specific value, as a whole (exact match) or partially (contains).
  • On the contrary, loading only rows excluding a specific value as a whole or partially is also possible. In addition, you can also easily limit the load to some rows or some columns, by selecting to load
  • All rows, or only the n first or last rows of the files, or rows from n to m.
  • All columns, or only the n first or last columns of the files, or columns from n to m.

These filtering options allow to view and process huge files - but of course if you then "save" your work, only this partial information will get saved.
The tool is robust enough to cope with files that are not completely structured (missing columns on some rows). Charsep will display the list of rows for which the number of columns is incorrect after loading them. Just saving it is enough to correct such errors in a file.

Open selected rows

This option allows to open a new instance of CharSep main dialog, with the rows that are currently selected from this dialog. This is helpful to handle different processes on different subsets of rows you have been selecting previously.

Save file..

Very similar to the "Open file" dialog options - it will allow you to change the separator or the character set of a file in a snap. The "Save file" also allows to save the matrix content as a text in the clipboard (In such a case UTF-8 is the standard character set that cannot be changed).
An option allows you to either store the matrix headers as the first row in the file or skip it. In this dialog you can also decide to only save a file with the rows selected in the matrix, or just an "excerpt" - the first n rows of the matrix, although commonly you will work with the full matrix, add/rearrange/delete rows as required, then save the full set of rows. By default a file is saved with Carriage Return - Linefeed characters between each row, but you can select to only have a Carriage Return character for Unix-type files.
If you select 'Append to file', the grid content will be added to the content of the file - if it exists -. If 'Append to file' is not ticked, and you chose an existing file name, this file would be overwritten by the matrix content.

Merge with file...

Merge with file allows to open a secondary file (with same options as "Open file") then through a set of options, allows to either select a set of "primary keys" between the two files in order to find the matching rows, or to simply add the second file as new rows or new columns. In case of a merge with matching rows, you can decide if the previous or newly loaded file takes precedence - and if you want to fulfill "empty cells" with values from the new file.
  1. Merge grids using lookup (primary keys) : this option is possible only if the two grids have the same structure (same number of columns). You select the list of columns from the grid, that will be used to "lookup" on the second grid. You can optionnally select "fuzzy matching" using Jaro-winkler algorithm. When rows from first grid and second one are matching, you can select to either keep values from first grid, or replace values from second grid. There is an optionto fulfill any empty cell with values from the new grid.
  2. Append all rows : rows from second grid will be added to the first one. This option is possible even if the number of columns do not match : columns will be empty if first grid contains more columns, and only columns matching the first grid columns will be loaded if second grid contains more columns.
  3. Add all columns in sequence : this option can be used even if grids have different structures. It just adds the rows from second grid, to the right of the first grid. If second grid contains more rows than first, only the number of rows of first grid will be loaded. If second grid contains less rows than first, empty cells will complement the columns for the last rows of first grid.
  4. Add all columns using lookup : this option can be used even if grids have different structures. It adds the rows from second grid, to the right of the first grid - but looking for matching rows based on a single-column lookup of key.

Generate random grid

This option opens the “Generate Random Grid” dialog. This allows you to generate rapidly mock-data / dummy data, for testing purpose for example. Left-side of the dialog allows you to define each column – selecting a header label, and a data type :

  • Int is a numerical value, by default between 0 and 99999 but you can provide personalized min and max values. An option allows to define a sequence rather than random numbers – the “min” value will be then allocated to first line, or only unique values in random order. You can decide to use a uniform distribution (each int has the same probability to get picked), or a normal distribution (with a mean and standard deviation), or a log-normal distribution.
  • Float  is a numerical value, similar to Int, but you can select a scale - a number of decimals after unit. Same as for Int, you can decide to use a uniform distribution (each int has the same probability to get picked), or a normal distribution (with a mean and standard deviation), or a log-normal distribution.
  • Text allows to get characters values, with a variety of options : Random will provide alternates of vowels and consonants – and you can define the max. text length, if the text shall include space chars or not, and case : lowercase, uppercase, or pretty-print (Lowercase with first char uppercase for each word). Beside “Random” text generation, you can select people names, user-Ids (Dummy ids with some prefix and name, including some digits), e-Mails, phone numbers, monetary values including a currency, generated companies names, Urls, etc… You can also select a “Set of constants and enter your own selected list of possible values. By including more than once the same value you can influence the probability for this value, e.g. if your list of constants is “true,false,false”, 2/3 of the values will be false. Warning : although data is randomly-generated, it may end up by coincidence with existing people names, URLs or e-mails. Text can also allow to generate UUID / Globally unique identifier, to generate unique identifiers following RFC 4122 specification.
  • Date and time datatype allows to randomly provide date/time values in a variety of formats. Min and max values can be provided (by default between 1/1/1900 and 31/12 of current year + 1) to define range. Buttons allow you to rapidly select ranges of "births" (from today - 70 years up to today - 18 years), "recent past" (from today - 2 years up to yesterday), "near future" (from tomorrow up to today + 2 years), that can then be further adjusted to your needs.
  • Boolean allows to select 0 or 1 values - without additional configuration.
  • Reference from file : the random generation can also get values from another CSV file - this is convenient to support "Foreign keys" for instance, to get values picked-up from a column from another file (either in random order or in sequence) that will be selected at generation time.

Before adding a column you can select the option to include some empty values in the column – with the approximate percentage of empty values expected.
Instead of manually defining the various columns you request in the matrix, you can also “Add random columns” which will randomly define for you the different types and options for between 5 and 12 new columns : this is a very quick way to build up a CSV file with “meaningful garbage” data.
Once you are fine with the random file definition (presented on the right side) you can further edit it (Double-click a column name to edit its label, use the up/down arrows to change ordering of columns and red cross to remove a column). You can also “Save” this file definition as a template to be reused for later re-generation of similar random files (or “Load” an existing file definition).
Last action is to define the number of rows you need and press “generate  - new grid” to get it. Optionally you can also only generate the defined columns as “added columns” on the right of the existing grid (Number of rows will adapt to existing number of rows in grid in such a case).

The Random grid generation also propose the direct-to-file generation to build big files in very rapid timeframe - this function called "Export direct file" allows to generate either a CSV file or a JSON structure based on the selected columns and patterns. Direct streaming occurs therefore files of multiple millions of rows are not a problem to be generated.

The Edit menu commands (processing of "Rows")

Commands from this menu allows you to manipulate the rows of the grid.

This menu allows to insert new rows, duplicate selected rows, delete selected rows or delete rows that are not selected.

Find..
The “Find” command (also available from contextual menu) provides a variety of selection features :
You can “find” values for a single column or for all columns – and can search rows starting, ending, matching exactly or containing a specific value. Advanced find features include a capability to search for duplicate values, or values matching either value from another column or any from a list of values copied in the clipboard.
Most options are standard find functions, like "starts with", "contains", "equals"... However some are more advanced find options:
  • Is different from previous value: allows to find "breaks" in the column, specifically useful if file is sorted - it will select only each of the first "different" value.
  • Has duplicates: different options exist for this find function. Simple "select all occurences" will allow to select all values if they appear more than once, but "select all but first" ensures that you can then delete the rows with duplicates, keeping one value of each of them though. "select only first occurence" on the opposite, can help you get a selection of only unique values. The three last options, "Similarity search", allows to use fuzzy-matching algorithms to find some duplicates although non-identical values. Jaro-winkler and Edit-distance algorithms can be parameterized to define a threshold of similarity, and the "soundex" similarity search will pop-up an outcome dialog where the soundex checksum score is added to your grid, to allow user to further control potential duplicates.

These find functions can be applied to a single column (including the first, row numbering col.) or to all columns of the grid (last option from the listbox). Search can be case-sensitive or not.

The standard outcome of a "find" action, is to select the resulting rows from the grid. But you can also "append" the resulting rows to your current selection, which allows for progressive finding of values, or "unselect" the resulting rows - keeping the ones that are not part of the found values untouched (selected or not). The last option is to open up the resulting rows in a separate grid, that you can use for latter processing (Export, further search, etc..).

  
Search & Replace...
The "Search & Replace" command allows to replace values from a column (or from all columns), through a variety of functions :
Either through simple search or Regular-expression based transformation,  but also allowing for Case change (Uppercase, Lowercase, Pretty Case...) , trimming of space characters (Left trim, right trim, full trim), removing non-alphabetic characters, or non-numeric characters (to cleanup content), keeping only a specific number of chars (from the left or from the right), or obfuscating data for anonymization purpose (Replacing all vowels by a, all consonnants by b, all digits by 0...). Since the search and replace command can apply either on all rows of the matrix or only the selected rows, this function can provided advanced transformation capabilities in conjunction with the "Find" function. You can find below the details of each search & replace options :
  • Replace: simple replacement of a text by another. Any part of cell fitting the source text will be replaced by the target text.
  • Replace full text: replacement of a text by another, if a cell fully equals to the source text it will be replaced by the target text.
  • Replace by a numeric sequence: this option does not take into account existing values, but just replace all values by 1, 2, 3.. This can help some processing, specifically in conjunction with "only in empty cells" or "only in selected rows"
  • Copy values from col: select a second column, and the values will be copied. This can be a way to complement values of a column, specifically in conjunction with "only in empty cells" or "only in selected rows"
  • Change to Uppercase, Lowercase, Prettycase, Camelcase: does change case of values in cell. Pretty Case will uppercase only the first letter of each word, Camelcase will uppercase first letter of each word and will remove any non-alpha character.
  • Remove all non-alpha, non-alphanumeric, non-numeric chars: transform cells values accordingly.
  • Remove repeating characters: remove the subsequent characters if identical - e.g. cell value "foo dataaaam" would become "fo datam"
  • Replace extended latine characters: this allows to get only Ascii chars from an extended latine chars file. éèë get all replaced by e, äàã by a, etc.. Some special codes are used to replace extended latine characters values.
  • Replace CR and CRLF : if Carriage-return and Linefeed characters were imported in the cells (e.g. using Clipboard import for instance), they can be replaced by any value of your choice or simply removed.
  • Left-trim, Right-trim, Trim spaces: this removes the leading spaces, trailing spaces, or both leading and trailing spaces in one action. It does not influence space characters embedded in text.
  • Fill-down : this allows to replace empty values of a column, by the last preceding non-empty value. All values of the column will get fulfilled - except the first ones if they were empty.
  • Obfuscate data: this is a way to scramble data in a non-reversible way, using one of two methods. First method: every vowel gets replaced by a and every consonant by b, and every digit gets replaced by 0. Second method: every vowel gets replaced by another random vowel, every consonant by another random consonant, and every digit gets replaced by a random digit. non-alphanumeric characters stay untouched.
  • Keep only n left chars, n right chars: you need to enter a numeric value - any cell value that is longer than this will be truncated to the number of chars, either by taking the prefix (left) or suffix (right) of the cell value.
  • Left-pad, Right-pad value with chars: you enter a string of chars - the existing value will be complemented by this string (either prefixed or suffixed), to reach the length of the entered string. All values will end up being of the same size. This does not impact the empty values. If you enter a single character, it will complement all values by repeating this character to reach the maximum value length.
  • Prefix with chars, suffix with chars: this option just append on left or right side of the values, with the entered target value.
  • Change chars position: this allows for powerful yet simple transformations. You enter a string and it will transform the source values from cell, by changing positions. For instance if your target is "4125", value ABCDE will become DABE. You can also keep position unchanged, e.g. "21........." will only swap the first two characters. Transformation limits the cells values length to your target entered value.
In addition to these transformations, some propose options to only impact "full empty cells", or "all cells" (not taking into account existing value), or act only on selected rows. The search part of transformation can be case sensitive or not, and when using standard replace, you can also use "Regex" functions for replacement of matching strings. The transformation can be made on a selected, single column, or can be applied to all columns of the grid (first option in listbox).


Compare with file command
This advanced "File Comparison" function allows to check differences between files - even if sort order is different, through the definition of "primary keys" columns. The command can report all differences found, or only missing rows from the reference grid or comparison file. Outcome of the comparison will be displayed in a separate window in two formats : textual or in a table. When selecting one of the discrepancies from this table, the related rows will also get selected from the main grids.
  1. You have to select the different columns that identify the unicity of a row in the grid, composing the "primary key". This will allow to report rows that are "the same" but with some differences on columns that are not part of the ones identifying the unicity. Commonly, one single column is selected, the "unique identifier" of the rows, but it can be composed of more than one column (e.g. First name and Last name). If you do not want to report differences at column level and consider any change to be a different row, you shall select all columns to be part of the primary key - every discrepency will be considered as "missing rows" from the grids. You have an option to report rows than contain the same primary key - reporting on row differences will always take into account the first matching row.
  2. Then you can provide a "label" for each of the 2 grids, that will be used in the report (e.g. "Grid V1" vs "Grid V2", "File" vs "Clipboard", etc..).
  3. You can then select what will be included in the report :
    • you can, for instance, provide a report of only exactly identical rows (when files are drastically different) : ticking only "Report identical rows"
    • "Report rows missing from grid 1" will list the rows present in the second file you load, not present in the current grid (i.e. "new rows")
    • "Report rows missing from grid 2" will list the rows present in the current grid, not present in the second file you load (i.e. "old rows, removed")
    • "Report row position differences" will report rows that are present and identical in both first grid and second file loaded, if they do not have the same position in file. Not the most common check you want to handle when looking for differences based on primary keys.
  4. Click "Select second grid.." to open another open file dialog - all options (e.g. loading from clipboard, filtering rows and columns, selecting the separator and character set...) are available. The resulting grid will be the one checked against the first loaded grid.
  5. Then the comparison process will be done, and both the second grid dialog and a result dialog open, the "Comparison report". It displays a header recap of the differences found, then two tabs : the first is a textual view of the differences, the second is a grid view of the differences. When selecting a row in this report, the relevant rows get selected in both grid 1 and grid 2. Rows can appear more than once if multiple differences are present on the same row, different columns.

Validate Grid Profile
This function allows to control the loaded file/grid, with a defined "profile" : for each column you can set if it has to be fulfilled or can be empty, should only store unique/distinct values, a min. and max. number of characters, and matches some rules on content : either through a RegEx or with simple checks like only alpha, only uppercase, in a list of values, etc...
This "profile" can be stored to reuse and check multiple files.
You can also use a "Wizard" to determine the profile rules based on the grid content - with some "precision", it will allow to set rules for which 100%, 99%, etc... of rows should match the profile.
The Structure menu commands (processing of "Columns")

This menu allows to alter the structure of the matrix (adding or removing columns to the grid).

Edit Structure
The "Edit structure" dialog provides a capability to re-organize the columns - select only a few, copy or add new ones  with default values or values based from clipboard content. It also allows to concatenate columns together.
The description of transformation can be stored in a "Map" - convenient feature to re-execute same transformation on different files. A "Map" can execute this transformation either based on columns position (e.g. select only Col. 1 and Col. 4 from a file) or on columns labels (e.g. select only Col. labelled ID and Col. labelled NAME from source file - whatever the position of these columns in the file).

Transpose

Transpose grid menu allows to pivot the table - getting columns as rows. The default  transpose function will use the current headers as the first column of transposed grid - and the current first column as headers of transposed grid, but you can select to not use the current headers as first column, and/or not use the current first column as headers for the transposed grid.

Columns insert from clipboard values
Quick insertion of columns can be done based on clipboard content : the dialog allows to select the columns-separator for the clipboard values (Clipboard is always managed as Unicode text).
A variety of options are possible :
- to "loop" and repeat the first values, up to grid completion
- to only include once the values - keeping blank the values for the following rows
- to use a "lookup" : by selecting the column in existing grid matching the clipboard first column values, the columns from clipboard will be inserted for the rows matching these values.
If there are too many values - Charsep will only complement up to the filling of the new columns for the grid existing number of rows.
You can optionally set values only on the selected rows, and decide if the clipboard contains also a "headers row" or not.

Empty columns deletion
You can also "delete empty columns" : all columns with no data on any row will be removed from the grid.

Deletion of last selected column
You can easily remove one column from structure by clicking on this column - wherever in the grid - then select this option.
The toolbar commands
The toolbar provides a quick search tool, that allows to search either a value in a cell of the grid or a value in the headers of the grid. It will move focus to position on the first found occurence - if you click again on "Search" or "Search headers" it will move focus to the next occurence. This search can be case-sensitive or not.

In the toolbar, there is also a combo-box that allows to display the sequence number of the column, or a letter - in place of the header loaded from file.

The "row selection" function (also available in contextual menu - see "Selection functions"), provides various ways to select rows from the grid : all rows, rows up to a row position or from a row position until the last one, or only "odd" or "even" rows. There is also a possibility to select an exact number of rows - evenly distributed in the grid (for instance, if you select 200 evenly distributed rows in a grid of 1000 rows, every 5th row will get selected - rows 1, 6, 11, 16,, 21, ...).

The "process" function opens a dialog to configure and execute file-based processing. It also provides a convenient way to configure command-line configuration files ("cfg file") that can be used in command-line processing. File-based processing allows for ultra-fast processing of very large files without in-memory limitations. For details on type of file-based processing, check Command-line processing help.
The contextual menu commands

By right-clicking on the matrix you will get many useful functionalities on the grid.

Selection functions :
You can select all rows - or invert the current selection.
"Show first selected row" is commonly used in conjunction with the "Find..." option, that allows to do complex searches on the grid - resulting set being selected.
In "select rows where" dialog, you can select the first n rows, last n rows, or a specific row number - but can also use more advanced functions like selecting all "even" rows, all "odd" rows, or a number of rows that will be evenly distributed throughout the grid.

Find functions :
You can "Find..." rows with a column (by default the one in which you right-clicked) or all columns eligible to start with, match exactly, contain, do not contain, be different of a value, match a regular expression, end with a value, or being empty/not empty.
Some more complex searches are possible : search for values different from the previous row, for values that have duplicates in the column (for which you can either select all duplicates, only the first occurence of duplicates, all occurences but the first (helps to remove the duplicates still keeping 1 row), unique values only. You can also select rows where a column values matches values from another column - or all rows for which a column values match values from the clipboard, Carriage-return separated.
You have the possibility to include a second find condition for the same or another column - 'AND' : in addition to the first one to select only rows eligible for both search conditions or 'OR' : to select rows eligible either for the first or second condition.
The "Find" results are usually returned as "selected rows" but you can also decide to append this selection to the currently selected rows, or deselecting the eligible rows (this allows very complex additive or substractive searches), or display the resulting rows in a separate dialog.
Most options are standard find functions, like "starts with", "contains", "equals"... However some are more advanced find options:
  • Is different from previous value: allows to find "breaks" in the column, specifically useful if file is sorted - it will select only each of the first "different" value.
  • Has duplicates: different options exist for this find function. Simple "select all occurences" will allow to select all values if they appear more than once, but "select all but first" ensures that you can then delete the rows with duplicates, keeping one value of each of them though. "select only first occurence" on the opposite, can help you get a selection of only unique values. The three last options, "Similarity search", allows to use fuzzy-matching algorithms to find some duplicates although non-identical values. Jaro-winkler and Edit-distance algorithms can be parameterized to define a threshold of similarity, and the "soundex" similarity search will pop-up an outcome dialog where the soundex checksum score is added to your grid, to allow user to further control potential duplicates.

Search/Replace and transform functions :
Beside the standard "Replace all values X by Y" you can replace values only when the full cell corresponds to the source value (optionnally case-sensitive) or all empty cells. The "target value" can be a static value, a numeric sequence (Starting from 1, 2, 3...), or values from another column on the same row.
"Transformation" does not require to enter a target value but use the source value and apply a transformation like changing to Uppercase, Lowercase, Pretty case, trim the values (left, right or both..), truncate the values to a certain number of characters (keep only the n left characters or n right characters - or full value if number of characters is lower than the entered one). You can also "pad" the value with a set of characters. For instance if you left-pad value "Abc" with "12345" the result will become "12Abc".
You can choose to apply the replacement or transformation either on all rows or only on selected rows - and for 1 specific column or for all. In conjunction with find/select functions, this search/replace capability brings endless possibilities of transformation...
You can find below the details of each search & replace options :
  • Replace: simple replacement of a text by another. Any part of cell fitting the source text will be replaced by the target text.
  • Replace full text: replacement of a text by another, if a cell fully equals to the source text it will be replaced by the target text.
  • Replace by a numeric sequence: this option does not take into account existing values, but just replace all values by 1, 2, 3.. This can help some processing, specifically in conjunction with "only in empty cells" or "only in selected rows"
  • Copy values from col: select a second column, and the values will be copied. This can be a way to complement values of a column, specifically in conjunction with "only in empty cells" or "only in selected rows"
  • Change to Uppercase, Lowercase, Prettycase, Camelcase: does change case of values in cell. Pretty Case will uppercase only the first letter of each word, Camelcase will uppercase first letter of each word and will remove any non-alpha character.
  • Remove all non-alpha, non-alphanumeric, non-numeric chars: transform cells values accordingly.
  • Remove repeating characters: remove the subsequent characters if identical - e.g. cell value "foo dataaaam" would become "fo datam"
  • Replace extended latine characters: this allows to get only Ascii chars from an extended latine chars file. éèë get all replaced by e, äàã by a, etc.. Some special codes are used to replace extended latine characters values.
  • Replace CR and CRLF : if Carriage-return and Linefeed characters were imported in the cells (e.g. using Clipboard import for instance), they can be replaced by any value of your choice or simply removed.
  • Left-trim, Right-trim, Trim spaces: this removes the leading spaces, trailing spaces, or both leading and trailing spaces in one action. It does not influence space characters embedded in text.
  • Fill-down : this allows to replace empty values of a column, by the last preceding non-empty value. All values of the column will get fulfilled - except the first ones if they were empty.
  • Obfuscate data: this is a way to scramble data in a non-reversible way, using one of two methods. First method: every vowel gets replaced by a and every consonant by b, and every digit gets replaced by 0. Second method: every vowel gets replaced by another random vowel, every consonant by another random consonant, and every digit gets replaced by a random digit. non-alphanumeric characters stay untouched.
  • Keep only n left chars, n right chars: you need to enter a numeric value - any cell value that is longer than this will be truncated to the number of chars, either by taking the prefix (left) or suffix (right) of the cell value.
  • Left-pad, Right-pad value with chars: you enter a string of chars - the existing value will be complemented by this string (either prefixed or suffixed), to reach the length of the entered string. All values will end up being of the same size. This does not impact the empty values. If you enter a single character, it will complement all values by repeating this character to reach the maximum value length.
  • Prefix with chars, suffix with chars: this option just append on left or right side of the values, with the entered target value.
  • Change chars position: this allows for powerful yet simple transformations. You enter a string and it will transform the source values from cell, by changing positions. For instance if your target is "4125", value ABCDE will become DABE. You can also keep position unchanged, e.g. "21........." will only swap the first two characters. Transformation limits the cells values length to your target entered value.
In addition to these transformations, some propose options to only impact "full empty cells", or "all cells" (not taking into account existing value), or act only on selected rows. The search part of transformation can be case sensitive or not, and when using standard replace, you can also use "Regex" functions for replacement of matching strings. The transformation can be made on a selected, single column, or can be applied to all columns of the grid (first option in listbox).

Export :
The contextual menu has also an 'export' functionality to generate a file out of the grid or to load the grid into the clipboard for using it with other applications. For CSV export, you can select the separator, include or not the header, select all columns or just some, and all or only the selected rows. A direct HTML or XLS export can be done for web browser visualisation or to load data into Excel.
 XML and JSON export allows to define your personalized template to transform the grid content into an XML structure. This is a standard functionality for all grids of the application - obviously the menu-accessible funtion to save a file brings more functionality than to export it through contextual menu.

A basic print functionality is also available in the contextual menu.

You can also display the distinct values of a column and the count of occurences for each of those values. This will open a separate window to list these distinct values - and when you double-click one of the values in this window, all the correspondant rows in the main grid will get selected.

The last option of this menu displays the index of the row currently selected, or the number of selected rows if more than one - and allows to open those selected rows in a separated dialog.

These contextual functions are available on the main matrix but also on any of the tables / grids of the tool (e.g. in the list of "distinct values" for instance).
Command line processing
In addition to the graphical user interface - interactive processing of csv files, you can also use a commandline to apply some transformations on a csv file. Since many arguments are required to be provided, they must be stored in a "Command file". The only arguments on the command line are this command file name, plus optionally (if not included in command file) the input and output files.
Preparation of this command line configuration file can be done using the dialog opened using button "Process". You can directly execute this processing from the interactive interface (which allows to process files of any size - processing does not require to fully load files in memory).

The command-line is executed using the following command :

java -cp charsep.jar com.charsep.CsvProcess <commandfile> [<inputfile> <outputfile>]


Command File should contain the following properties ( <property>=<value> )
  • INPUT_FILE=<inputfile> If not included will use <inputfile> from command-line
  • INPUT_CHR=<characterset> default = UTF8
  • INPUT_SEPARATOR=<separator_charcode> default = PIPE separated values |
  • OUTPUT_FILE=<outputfile> If not included will use <outputfile> from command-line
  • OUTPUT_CHR=<characterset> default = UTF8
  • OUTPUT_SEPARATOR=<separator_charcode> default = Same as input separator
  • ROWS_HEADERS=<nn> represent the number of header rows to skip. default = 0
  • REMOVE_MULTIBYTES=<true|false> default = false
  • CORRECTION=<NONE|SKIP|APPEND> If not included will use NONE
  • CORRECTION_COLS=<nn> If CORRECTION is SKIP or APPEND, provide number of cols expected. default = 1
  • ACTION= see below - default = COPY_ALL
  • COLUMNS=<colA colB ...> If ACTION is INCLUDE or EXCLUDE list the related columns numbers
  • PROFILE_FILE=<profilefile.ccn]> If ACTION is PROFILE_CHECK or TRANSFORM define the profile file to check or target transformed structure
  • VERBOSITY=<nn> 0 No message 1 Display only errors 2 Display warnings 3 Display Information. default = 2
Action :
  • COPY_ALL       all columns are copied in target
  • COPY_INCLUDE   only include a list of columns in target
  • COPY_EXCLUDE   exclude a list of columns from target - copy all others
  • COPY_NONBLANKS include in target only columns for which there is at least one value
  • PROFILE_CHECK  target file does expose the set of rows/columns conflicting with the profile rules
  • TRANSFORM_NUM   target file does map to the structure file based on columns positions       
  • TRANSFORM_NAME  target file does map to the structure file based on columns labels
Correction mode :
  • NONE     takes all rows in account - appends blank columns if not the same number as on first row
  • SKIP       if not the same number as on first row - full row will be skipped
  • APPEND if not the same number as on first row - will concatenate next rows until correct number of columns is reached

Example 1 : command file to copy a UTF-8 file - changing separator from Pipe to Comma and removing header:

ACTION=COPY_ALL
INPUT_SEPARATOR=PIPE
OUTPUT_SEPARATOR=COMMA
ROWS_HEADERS=1
VERBOSITY=2

Example 2 : command file to check a UTF-8 file with header, pipe-separated, based on a profile check file stored in C:\checkfile.ccn :
ACTION=PROFILE_CHECK
ROWS_HEADERS=1
PROFILE_FILE=C:\checkfile.ccn
VERBOSITY=2

Example 3 : command file to transform UTF-8 file called C:\Sample.csv with header, pipe-separated, to a Windows-1251 file called C:\SampleReverse.csv, tab-separated, based on structure change file C:\reverse.cmp using columns positions from input file:
ACTION=TRANSFORM_NUM
CORRECTION=NONE
INPUT_CHR=UTF-8
INPUT_FILE=C:\Sample.csv
INPUT_SEPARATOR=PIPE
OUTPUT_CHR=cp1251
OUTPUT_FILE=C:\SampleReverse.csv
OUTPUT_SEPARATOR=TAB
PROFILE_FILE=C:\reverse.cmp
REMOVE_MULTIBYTES=FALSE
ROWS_HEADERS=0
VERBOSITY=0

 << back to Charsep main page