[
https://jira.jboss.org/jira/browse/DNA-105?page=com.atlassian.jira.plugin...
]
Brian Carothers updated DNA-105:
--------------------------------
Attachment: DNA-105_preview.patch
Attached patch that adds a new project with two new sequencers, one for parsing sequencing
files and one for sequencing fixed width files. Both sequencers are row-based and assume
that the file is allocated at one row per line with the EOL character being recognizable
by BufferedReader.readLine(). Both sequencers support basic functionality like being able
to specify a comment character and custom factory for creating the record associated with
each row.
The DelimitedTextSequencer allows users to specify a particular regular expression that is
used to split each line into columns. This value defaults to ",", presumably
allowing this solution to address DNA-229 as well. Since any valid regular expression can
be used, this also allows for the use of multiple delimiters.
The FixedWidthTextSequencer allows users to specify the column index at which each column
starts either as an array of ints or as a single, comma-delimited string. As an example,
a user could specify { 3, 6 } as this index to sequence this line:
onetwothree
into this array of columns:
one, two, three
The attached patch is about as undocumented as possible, but it's exceedingly
straightforward. In fact, there's only about 100 lines of actual code. The rest is
just test cases and test data. I'm going to leave this out for a day or two in case
anyone has comments. If no major concerns are expressed, I'm going to add
documentation, commit this, and then treat any future bugs or feature requests as separate
issues.
Sequence content from delimited text files
------------------------------------------
Key: DNA-105
URL:
https://jira.jboss.org/jira/browse/DNA-105
Project: DNA
Issue Type: Feature Request
Components: Sequencers
Reporter: Greg Haber
Priority: Minor
Fix For: Future Releases
Attachments: DNA-105_preview.patch
It is a somewhat common practice for delimited text files such as comma-separated (CSV)
or tab delimited files to contain a list of column names in the first line of the file. A
sequencer to capture this metadata could be useful,and should be straightforward to
implement.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira