Rexexp to match all the numbers,alphabets,special characters in a string - regex

I want a pattern to match a string that has everything in it(alphabets,numbers,special charactres)
public static void main(String[] args) {
String retVal=null;
try
{
String s1 = "[0-9a-zA-Z].*:[0-9a-zA-Z].*:(.*):[0-9a-zA-Z].*";
String s2 = "BNTPSDAE31G:BNTPSDAE:Healthcheck:Major";
Pattern pattern = null;
//if ( ! StringUtils.isEmpty(s1) )
if ( ( s1 != null ) && ( ! s1.matches("\\s*") ) )
{
pattern = Pattern.compile(s1);
}
//if ( ! StringUtils.isEmpty(s2) )
if ( s2 != null )
{
Matcher matcher = pattern.matcher( s2 );
if ( matcher.matches() )
{
retVal = matcher.group(1);
// A special case/kludge for Asentria. Temp alarms contain "Normal/High" etc.
// Switch Normal to return CLEAR. The default for this usage will be RAISE.
// Need to handle switches in XML. This won't work if anyone puts "normal" in their event alias.
if ("Restore".equalsIgnoreCase ( retVal ) )
{
}
}
}
}
catch( Exception e )
{
System.out.println("Error evaluating args : " );
}
System.out.println("retVal------"+retVal);
}
and output is:
Healthcheck
Hera using this [0-9a-zA-Z].* am matching only alpahbets and numbers,but i want to match the string if it has special characters also
Any help is highly appreciated

Try this:
If you want to match individual elements try this:
2.1.2 :001 > s = "asad3435##:$%adasd1213"
2.1.2 :008 > s.scan(/./)
=> ["a", "s", "a", "d", "3", "4", "3", "5", "#", "#", ":", "$", "%", "a", "d", "a", "s", "d", "1", "2", "1", "3"]
or you want match all at once try this:
2.1.2 :009 > s.scan(/[^.]+/)
=> ["asad3435##:$%adasd1213"]

Try the following regex, it works for me :)
[^:]+
You might need to put a global modifier on it to get it to match all strings.

Related

How to select first chars with a custom word boundary?

I've test cases with a series of words like this :
{
input: "Halley's Comet",
expected: "HC",
},
{
input: "First In, First Out",
expected: "FIFO",
},
{
input: "The Road _Not_ Taken",
expected: "TRNT",
},
I want with one regex to match all first letters of these words, avoid char: "_" to be matched as a first letter and count single quote in the word.
Currently, I have this regex working on pcre syntax but not with Go regexp package : (?<![a-zA-Z0-9'])([a-zA-Z0-9'])
I know lookarounds aren't supported by Go but I'm looking for a good way to do that.
I also use this func to get an array of all strings : re.FindAllString(s, -1)
Thanks for helping.
Something that plays with character classes and word boundaries should suffice:
\b_*([a-z])[a-z]*(?:'s)?_*\b\W*
demo
Usage:
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`(?i)\b_*([a-z])[a-z]*(?:'s)?_*\b\W*`)
fmt.Println(re.ReplaceAllString("O'Brian's dog", "$1"))
}
ftr, regexp less solution
package main
import (
"fmt"
)
func main() {
inputs := []string{"Hallمرحباey's Comet", "First In, First Out", "The Road _Not_ Taken", "O'Brian's Dog"}
c := [][]string{}
w := [][]string{}
for _, input := range inputs {
c = append(c, firstLet(input))
w = append(w, words(input))
}
fmt.Printf("%#v\n", w)
fmt.Printf("%#v\n", c)
}
func firstLet(in string) (out []string) {
var inword bool
for _, r := range in {
if !inword {
if isChar(r) {
inword = true
out = append(out, string(r))
}
} else if r == ' ' {
inword = false
}
}
return out
}
func words(in string) (out []string) {
var inword bool
var w []rune
for _, r := range in {
if !inword {
if isChar(r) {
w = append(w, r)
inword = true
}
} else if r == ' ' {
if len(w) > 0 {
out = append(out, string(w))
w = w[:0]
}
inword = false
} else if r != '_' {
w = append(w, r)
}
}
if len(w) > 0 {
out = append(out, string(w))
}
return out
}
func isChar(r rune) bool {
return (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z')
}
outputs
[][]string{[]string{"Hallمرحباey's", "Comet"}, []string{"First", "In,", "First", "Out"}, []string{"The", "Road", "Not", "Taken"}, []string{"O'Brian's", "Dog"}}
[][]string{[]string{"H", "C"}, []string{"F", "I", "F", "O"}, []string{"T", "R", "N", "T"}, []string{"O", "D"}}

How do I swap some characters of a String with values of a HashMap<String,String> in Kotlin?

Assuming I have a
val s: String = "14ABC5"
and have a HashMap
val b: HashMap<String,String> = hashMapOf("A" to "10", "B" to "11", "C" to "12", "D" to "13", "E" to "14", "F" to "15" )
How would I change all occurrences of A,B,C with 10, 11, 12 while keeping their order ("1", "4", "10", "11", "12", "5")?
So far I have this
val result: List<String> = s.toUpperCase().toCharArray().map{ it.toString() }.map{ it -> b.getValue(it)}
which works if ALL characters of the String exist in the HashMap but my String may contain inexistent keys as well.
You could either use getOrDefault(...), or the Kotlinesque b[it] ?: it.
By the way, if you're using the implicit lambda argument name (it), you can get rid of the it ->.
You can use the String as an iterable by default and simplify your code as follows:
s.map { it.toString() }
.map { b[it] ?: it }

Remove quotes between letters

In golang, how can I remove quotes between two letters, like that:
import (
"testing"
)
func TestRemoveQuotes(t *testing.T) {
var a = "bus\"zipcode"
var mockResult = "bus zipcode"
a = RemoveQuotes(a)
if a != mockResult {
t.Error("Error or TestRemoveQuotes: ", a)
}
}
Function:
import (
"fmt"
"strings"
)
func RemoveQuotes(s string) string {
s = strings.Replace(s, "\"", "", -1) //here I removed all quotes. I'd like to remove only quotes between letters
fmt.Println(s)
return s
}
For example:
"bus"zipcode" = "bus zipcode"
You may use a simple \b"\b regex that matches a double quote only when preceded and followed with word boundaries:
package main
import (
"fmt"
"regexp"
)
func main() {
var a = "\"test1\",\"test2\",\"tes\"t3\""
fmt.Println(RemoveQuotes(a))
}
func RemoveQuotes(s string) string {
re := regexp.MustCompile(`\b"\b`)
return re.ReplaceAllString(s, "")
}
See the Go demo printing "test1","test2","test3".
Also, see the online regex demo.
I am not sure about what you need when you commented I want to only quote inside test3.
This code is removing the quotes from the inside, as you did, but it is adding the quotes with fmt.Sprintf()
package main
import (
"fmt"
"strings"
)
func main() {
var a = "\"test1\",\"test2\",\"tes\"t3\""
fmt.Println(RemoveQuotes(a))
}
func RemoveQuotes(s string) string {
s = strings.Replace(s, "\"", "", -1) //here I removed all quotes. I'd like to remove only quotes between letters
return fmt.Sprintf(`"%s"`, s)
}
https://play.golang.org/p/dKB9DwYXZp
In your example you define a string variable so the outer quotes are not part of the actual string. If you would do fmt.Println("bus\"zipcode") the output on the screen would be bus"zipcode. If your goal is to replace quotes in a string with a space then you need to replace the quote not with an empty string as you do, but rather with a space - s = strings.Replace(s, "\"", " ", -1). Though if you want to remove the quotes entirely you can do something like this:
package main
import (
"fmt"
"strings"
)
func RemoveQuotes(s string) string {
result := ""
arr := strings.Split(s, ",")
for i:=0;i<len(arr);i++ {
sub := strings.Replace(arr[i], "\"", "", -1)
result = fmt.Sprintf("%s,\"%s\"", result, sub)
}
return result[1:]
}
func main() {
a:= "\"test1\",\"test2\",\"tes\"t3\""
fmt.Println(RemoveQuotes(a))
}
Note however that this is not very efficient, but I assume it's more about learning how to do it in this case.

How to split string on comma, except when comma is parameter separator or part of string [duplicate]

Given
2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,"Corvallis, OR",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34
How to use C# to split the above information into strings as follows:
2
1016
7/31/2008 14:22
Geoff Dalgas
6/5/2011 22:21
http://stackoverflow.com
Corvallis, OR
7679
351
81
b437f461b3fd27387c5d8ab47a293d35
34
As you can see one of the column contains , <= (Corvallis, OR)
Based on
C# Regex Split - commas outside quotes
string[] result = Regex.Split(samplestring, ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
Use the Microsoft.VisualBasic.FileIO.TextFieldParser class. This will handle parsing a delimited file, TextReader or Stream where some fields are enclosed in quotes and some are not.
For example:
using Microsoft.VisualBasic.FileIO;
string csv = "2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,\"Corvallis, OR\",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";
TextFieldParser parser = new TextFieldParser(new StringReader(csv));
// You can also read from a file
// TextFieldParser parser = new TextFieldParser("mycsvfile.csv");
parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
string[] fields;
while (!parser.EndOfData)
{
fields = parser.ReadFields();
foreach (string field in fields)
{
Console.WriteLine(field);
}
}
parser.Close();
This should result in the following output:
2
1016
7/31/2008 14:22
Geoff Dalgas
6/5/2011 22:21
http://stackoverflow.com
Corvallis, OR
7679
351
81
b437f461b3fd27387c5d8ab47a293d35
34
See Microsoft.VisualBasic.FileIO.TextFieldParser for more information.
You need to add a reference to Microsoft.VisualBasic in the Add References .NET tab.
It is so much late but this can be helpful for someone. We can use RegEx as bellow.
Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
String[] Fields = CSVParser.Split(Test);
I see that if you paste csv delimited text in Excel and do a "Text to Columns", it asks you for a "text qualifier". It's defaulted to a double quote so that it treats text within double quotes as literal. I imagine that Excel implements this by going one character at a time, if it encounters a "text qualifier", it keeps going to the next "qualifier". You can probably implement this yourself with a for loop and a boolean to denote if you're inside literal text.
public string[] CsvParser(string csvText)
{
List<string> tokens = new List<string>();
int last = -1;
int current = 0;
bool inText = false;
while(current < csvText.Length)
{
switch(csvText[current])
{
case '"':
inText = !inText; break;
case ',':
if (!inText)
{
tokens.Add(csvText.Substring(last + 1, (current - last)).Trim(' ', ','));
last = current;
}
break;
default:
break;
}
current++;
}
if (last != csvText.Length - 1)
{
tokens.Add(csvText.Substring(last+1).Trim());
}
return tokens.ToArray();
}
You could split on all commas that do have an even number of quotes following them.
You would also like to view at the specf for CSV format about handling comma's.
Useful Link : C# Regex Split - commas outside quotes
Use a library like LumenWorks to do your CSV reading. It'll handle fields with quotes in them and will likely overall be more robust than your custom solution by virtue of having been around for a long time.
It is a tricky matter to parse .csv files when the .csv file could be either comma separated strings, comma separated quoted strings, or a chaotic combination of the two. The solution I came up with allows for any of the three possibilities.
I created a method, ParseCsvRow() which returns an array from a csv string. I first deal with double quotes in the string by splitting the string on double quotes into an array called quotesArray. Quoted string .csv files are only valid if there is an even number of double quotes. Double quotes in a column value should be replaced with a pair of double quotes (This is Excel's approach). As long as the .csv file meets these requirements, you can expect the delimiter commas to appear only outside of pairs of double quotes. Commas inside of pairs of double quotes are part of the column value and should be ignored when splitting the .csv into an array.
My method will test for commas outside of double quote pairs by looking only at even indexes of the quotesArray. It also removes double quotes from the start and end of column values.
public static string[] ParseCsvRow(string csvrow)
{
const string obscureCharacter = "ᖳ";
if (csvrow.Contains(obscureCharacter)) throw new Exception("Error: csv row may not contain the " + obscureCharacter + " character");
var unicodeSeparatedString = "";
var quotesArray = csvrow.Split('"'); // Split string on double quote character
if (quotesArray.Length > 1)
{
for (var i = 0; i < quotesArray.Length; i++)
{
// CSV must use double quotes to represent a quote inside a quoted cell
// Quotes must be paired up
// Test if a comma lays outside a pair of quotes. If so, replace the comma with an obscure unicode character
if (Math.Round(Math.Round((decimal) i/2)*2) == i)
{
var s = quotesArray[i].Trim();
switch (s)
{
case ",":
quotesArray[i] = obscureCharacter; // Change quoted comma seperated string to quoted "obscure character" seperated string
break;
}
}
// Build string and Replace quotes where quotes were expected.
unicodeSeparatedString += (i > 0 ? "\"" : "") + quotesArray[i].Trim();
}
}
else
{
// String does not have any pairs of double quotes. It should be safe to just replace the commas with the obscure character
unicodeSeparatedString = csvrow.Replace(",", obscureCharacter);
}
var csvRowArray = unicodeSeparatedString.Split(obscureCharacter[0]);
for (var i = 0; i < csvRowArray.Length; i++)
{
var s = csvRowArray[i].Trim();
if (s.StartsWith("\"") && s.EndsWith("\""))
{
csvRowArray[i] = s.Length > 2 ? s.Substring(1, s.Length - 2) : ""; // Remove start and end quotes.
}
}
return csvRowArray;
}
One downside of my approach is the way I temporarily replace delimiter commas with an obscure unicode character. This character needs to be so obscure, it would never show up in your .csv file. You may want to put more handling around this.
This question and its duplicates have a lot of answers. I tried this one that looked promising, but found some bugs in it. I heavily modified it so that it would pass all of my tests.
/// <summary>
/// Returns a collection of strings that are derived by splitting the given source string at
/// characters given by the 'delimiter' parameter. However, a substring may be enclosed between
/// pairs of the 'qualifier' character so that instances of the delimiter can be taken as literal
/// parts of the substring. The method was originally developed to split comma-separated text
/// where quotes could be used to qualify text that contains commas that are to be taken as literal
/// parts of the substring. For example, the following source:
/// A, B, "C, D", E, "F, G"
/// would be split into 5 substrings:
/// A
/// B
/// C, D
/// E
/// F, G
/// When enclosed inside of qualifiers, the literal for the qualifier character may be represented
/// by two consecutive qualifiers. The two consecutive qualifiers are distinguished from a closing
/// qualifier character. For example, the following source:
/// A, "B, ""C"""
/// would be split into 2 substrings:
/// A
/// B, "C"
/// </summary>
/// <remarks>Originally based on: https://stackoverflow.com/a/43284485/2998072</remarks>
/// <param name="source">The string that is to be split</param>
/// <param name="delimiter">The character that separates the substrings</param>
/// <param name="qualifier">The character that is used (in pairs) to enclose a substring</param>
/// <param name="toTrim">If true, then whitespace is removed from the beginning and end of each
/// substring. If false, then whitespace is preserved at the beginning and end of each substring.
/// </param>
public static List<String> SplitQualified(this String source, Char delimiter, Char qualifier,
Boolean toTrim)
{
// Avoid throwing exception if the source is null
if (String.IsNullOrEmpty(source))
return new List<String> { "" };
var results = new List<String>();
var result = new StringBuilder();
Boolean inQualifier = false;
// The algorithm is designed to expect a delimiter at the end of each substring, but the
// expectation of the caller is that the final substring is not terminated by delimiter.
// Therefore, we add an artificial delimiter at the end before looping through the source string.
String sourceX = source + delimiter;
// Loop through each character of the source
for (var idx = 0; idx < sourceX.Length; idx++)
{
// If current character is a delimiter
// (except if we're inside of qualifiers, we ignore the delimiter)
if (sourceX[idx] == delimiter && inQualifier == false)
{
// Terminate the current substring by adding it to the collection
// (trim if specified by the method parameter)
results.Add(toTrim ? result.ToString().Trim() : result.ToString());
result.Clear();
}
// If current character is a qualifier
else if (sourceX[idx] == qualifier)
{
// ...and we're already inside of qualifier
if (inQualifier)
{
// check for double-qualifiers, which is escape code for a single
// literal qualifier character.
if (idx + 1 < sourceX.Length && sourceX[idx + 1] == qualifier)
{
idx++;
result.Append(sourceX[idx]);
continue;
}
// Since we found only a single qualifier, that means that we've
// found the end of the enclosing qualifiers.
inQualifier = false;
continue;
}
else
// ...we found an opening qualifier
inQualifier = true;
}
// If current character is neither qualifier nor delimiter
else
result.Append(sourceX[idx]);
}
return results;
}
Here are the test methods to prove that it works:
[TestMethod()]
public void SplitQualified_00()
{
// Example with no substrings
String s = "";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "" }, substrings);
}
[TestMethod()]
public void SplitQualified_00A()
{
// just a single delimiter
String s = ",";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "", "" }, substrings);
}
[TestMethod()]
public void SplitQualified_01()
{
// Example with no whitespace or qualifiers
String s = "1,2,3,1,2,3";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_02()
{
// Example with whitespace and no qualifiers
String s = " 1, 2 ,3, 1 ,2\t, 3 ";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_03()
{
// Example with whitespace and no qualifiers
String s = " 1, 2 ,3, 1 ,2\t, 3 ";
// whitespace should be preserved
var substrings = s.SplitQualified(',', '"', false);
CollectionAssert.AreEquivalent(
new List<String> { " 1", " 2 ", "3", " 1 ", "2\t", " 3 " },
substrings);
}
[TestMethod()]
public void SplitQualified_04()
{
// Example with no whitespace and trivial qualifiers.
String s = "1,\"2\",3,1,2,\"3\"";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
s = "\"1\",\"2\",3,1,\"2\",3";
substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_05()
{
// Example with no whitespace and qualifiers that enclose delimiters
String s = "1,\"2,2a\",3,1,2,\"3,3a\"";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2,2a", "3", "1", "2", "3,3a" },
substrings);
s = "\"1,1a\",\"2,2b\",3,1,\"2,2c\",3";
substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1,1a", "2,2b", "3", "1", "2,2c", "3" },
substrings);
}
[TestMethod()]
public void SplitQualified_06()
{
// Example with qualifiers enclosing whitespace but no delimiter
String s = "\" 1 \",\"2 \",3,1,2,\"\t3\t\"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" },
substrings);
}
[TestMethod()]
public void SplitQualified_07()
{
// Example with qualifiers enclosing whitespace but no delimiter
String s = "\" 1 \",\"2 \",3,1,2,\"\t3\t\"";
// whitespace should be preserved
var substrings = s.SplitQualified(',', '"', false);
CollectionAssert.AreEquivalent(new List<String> { " 1 ", "2 ", "3", "1", "2", "\t3\t" },
substrings);
}
[TestMethod()]
public void SplitQualified_08()
{
// Example with qualifiers enclosing whitespace but no delimiter; also whitespace btwn delimiters
String s = "\" 1 \", \"2 \" , 3,1, 2 ,\" 3 \"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" },
substrings);
}
[TestMethod()]
public void SplitQualified_09()
{
// Example with qualifiers enclosing whitespace but no delimiter; also whitespace btwn delimiters
String s = "\" 1 \", \"2 \" , 3,1, 2 ,\" 3 \"";
// whitespace should be preserved
var substrings = s.SplitQualified(',', '"', false);
CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 ", " 3", "1", " 2 ", " 3 " },
substrings);
}
[TestMethod()]
public void SplitQualified_10()
{
// Example with qualifiers enclosing whitespace and delimiter
String s = "\" 1 \",\"2 , 2b \",3,1,2,\" 3,3c \"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2 , 2b", "3", "1", "2", "3,3c" },
substrings);
}
[TestMethod()]
public void SplitQualified_11()
{
// Example with qualifiers enclosing whitespace and delimiter; also whitespace btwn delimiters
String s = "\" 1 \", \"2 , 2b \" , 3,1, 2 ,\" 3,3c \"";
// whitespace should be preserved
var substrings = s.SplitQualified(',', '"', false);
CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 , 2b ", " 3", "1", " 2 ", " 3,3c " },
substrings);
}
[TestMethod()]
public void SplitQualified_12()
{
// Example with tab characters between delimiters
String s = "\t1,\t2\t,3,1,\t2\t,\t3\t";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_13()
{
// Example with newline characters between delimiters
String s = "\n1,\n2\n,3,1,\n2\n,\n3\n";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_14()
{
// Example with qualifiers enclosing whitespace and delimiter, plus escaped qualifier
String s = "\" 1 \",\"\"\"2 , 2b \"\"\",3,1,2,\" \"\"3,3c \"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "\"2 , 2b \"", "3", "1", "2", "\"3,3c" },
substrings);
}
[TestMethod()]
public void SplitQualified_14A()
{
// Example with qualifiers enclosing whitespace and delimiter, plus escaped qualifier
String s = "\"\"\"1\"\"\"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "\"1\"" },
substrings);
}
[TestMethod()]
public void SplitQualified_15()
{
// Instead of comma-delimited and quote-qualified, use pipe and hash
// Example with no whitespace or qualifiers
String s = "1|2|3|1|2,2f|3";
var substrings = s.SplitQualified('|', '#', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2,2f", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_16()
{
// Instead of comma-delimited and quote-qualified, use pipe and hash
// Example with qualifiers enclosing whitespace and delimiter
String s = "# 1 #|#2 | 2b #|3|1|2|# 3|3c #";
// whitespace should be removed
var substrings = s.SplitQualified('|', '#', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2 | 2b", "3", "1", "2", "3|3c" },
substrings);
}
[TestMethod()]
public void SplitQualified_17()
{
// Instead of comma-delimited and quote-qualified, use pipe and hash
// Example with qualifiers enclosing whitespace and delimiter; also whitespace btwn delimiters
String s = "# 1 #| #2 | 2b # | 3|1| 2 |# 3|3c #";
// whitespace should be preserved
var substrings = s.SplitQualified('|', '#', false);
CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 | 2b ", " 3", "1", " 2 ", " 3|3c " },
substrings);
}
I had a problem with a CSV that contains fields with a quote character in them, so using the TextFieldParser, I came up with the following:
private static string[] parseCSVLine(string csvLine)
{
using (TextFieldParser TFP = new TextFieldParser(new MemoryStream(Encoding.UTF8.GetBytes(csvLine))))
{
TFP.HasFieldsEnclosedInQuotes = true;
TFP.SetDelimiters(",");
try
{
return TFP.ReadFields();
}
catch (MalformedLineException)
{
StringBuilder m_sbLine = new StringBuilder();
for (int i = 0; i < TFP.ErrorLine.Length; i++)
{
if (i > 0 && TFP.ErrorLine[i]== '"' &&(TFP.ErrorLine[i + 1] != ',' && TFP.ErrorLine[i - 1] != ','))
m_sbLine.Append("\"\"");
else
m_sbLine.Append(TFP.ErrorLine[i]);
}
return parseCSVLine(m_sbLine.ToString());
}
}
}
A StreamReader is still used to read the CSV line by line, as follows:
using(StreamReader SR = new StreamReader(FileName))
{
while (SR.Peek() >-1)
myStringArray = parseCSVLine(SR.ReadLine());
}
With Cinchoo ETL - an open source library, it can automatically handles columns values containing separators.
string csv = #"2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,""Corvallis, OR"",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";
using (var p = ChoCSVReader.LoadText(csv)
)
{
Console.WriteLine(p.Dump());
}
Output:
Key: Column1 [Type: String]
Value: 2
Key: Column2 [Type: String]
Value: 1016
Key: Column3 [Type: String]
Value: 7/31/2008 14:22
Key: Column4 [Type: String]
Value: Geoff Dalgas
Key: Column5 [Type: String]
Value: 6/5/2011 22:21
Key: Column6 [Type: String]
Value: http://stackoverflow.com
Key: Column7 [Type: String]
Value: Corvallis, OR
Key: Column8 [Type: String]
Value: 7679
Key: Column9 [Type: String]
Value: 351
Key: Column10 [Type: String]
Value: 81
Key: Column11 [Type: String]
Value: b437f461b3fd27387c5d8ab47a293d35
Key: Column12 [Type: String]
Value: 34
For more information, please visit codeproject article.
Hope it helps.

Swift splitting "abc1.23.456.7890xyz" into "abc", "1", "23", "456", "7890" and "xyz"

In Swift on OS X I am trying to chop up the string "abc1.23.456.7890xyz" into these strings:
"abc"
"1"
"23"
"456"
"7890"
"xyz"
but when I run the following code I get the following:
=> "abc1.23.456.7890xyz"
(0,3) -> "abc"
(3,1) -> "1"
(12,4) -> "7890"
(16,3) -> "xyz"
which means that the application correctly found "abc", the first token "1", but then the next token found is "7890" (missing out "23" and "456") followed by "xyz".
Can anyone see how the code can be changed to find ALL of the strings (including "23" and "456")?
Many thanks in advance.
import Foundation
import XCTest
public
class StackOverflowTest: XCTestCase {
public
func testRegex() {
do {
let patternString = "([^0-9]*)([0-9]+)(?:\\.([0-9]+))*([^0-9]*)"
let regex = try NSRegularExpression(pattern: patternString, options: [])
let string = "abc1.23.456.7890xyz"
print("=> \"\(string)\"")
let range = NSMakeRange(0, string.characters.count)
regex.enumerateMatchesInString(string, options: [], range: range) {
(textCheckingResult, _, _) in
if let textCheckingResult = textCheckingResult {
for nsRangeIndex in 1 ..< textCheckingResult.numberOfRanges {
let nsRange = textCheckingResult.rangeAtIndex(nsRangeIndex)
let location = nsRange.location
if location < Int.max {
let startIndex = string.startIndex.advancedBy(location)
let endIndex = startIndex.advancedBy(nsRange.length)
let value = string[startIndex ..< endIndex]
print("\(nsRange) -> \"\(value)\"")
}
}
}
}
} catch {
}
}
}
It's all about your regex pattern. You want to find a series of contiguous letters or digits. Try this pattern instead:
let patternString = "([a-zA-Z]+|\\d+)"
alternative 'Swifty' way
let str = "abc1.23.456.7890xyz"
let chars = str.characters.map{ $0 }
enum CharType {
case Number
case Alpha
init(c: Character) {
self = .Alpha
if isNumber(c) {
self = .Number
}
}
func isNumber(c: Character)->Bool {
return "1234567890".characters.map{ $0 }.contains(c)
}
}
var tmp = ""
tmp.append(chars[0])
var type = CharType(c: chars[0])
for i in 1..<chars.count {
let c = CharType(c: chars[i])
if c != type {
tmp.append(Character("."))
}
tmp.append(chars[i])
type = c
}
tmp.characters.split(".", maxSplit: Int.max, allowEmptySlices: false).map(String.init)
// ["abc", "1", "23", "456", "7890", "xyz"]